Aims of this tutorial:
It may be long, but it should be easy to complete. The core points investigated here are of high importance and part of the assessable material for the course.
Prerequisites:
Notes:
https://pytorch.org/docs/stable/tensors.html
https://pytorch.org/docs/stable/nn.html
./utils folder. They will be used out of the box below.Loading and inspecting MNIST data. Same as previous tutorial...
# -*- coding: utf-8 -*-
# The below is for auto-reloading external modules after they are changed, such as those in ./utils.
# Issue: http://stackoverflow.com/questions/1907993/autoreload-of-modules-in-ipython
%load_ext autoreload
%autoreload 2
import numpy as np
from utils.data_utils import get_mnist # Helper function. Use it out of the box.
# Constants
DATA_DIR = './data/mnist' # Location we will keep the data.
SEED = 111111
# If datasets are not at specified location, they will be downloaded.
train_imgs, train_lbls = get_mnist(data_dir=DATA_DIR, train=True, download=True)
test_imgs, test_lbls = get_mnist(data_dir=DATA_DIR, train=False, download=True)
print("[train_imgs] Type: ", type(train_imgs), "|| Shape:", train_imgs.shape, "|| Data type: ", train_imgs.dtype )
print("[train_lbls] Type: ", type(train_lbls), "|| Shape:", train_lbls.shape, "|| Data type: ", train_lbls.dtype )
print('Class labels in train = ', np.unique(train_lbls))
print("[test_imgs] Type: ", type(test_imgs), "|| Shape:", test_imgs.shape, " || Data type: ", test_imgs.dtype )
print("[test_lbls] Type: ", type(test_lbls), "|| Shape:", test_lbls.shape, " || Data type: ", test_lbls.dtype )
print('Class labels in test = ', np.unique(test_lbls))
N_tr_imgs = train_imgs.shape[0] # N hereafter. Number of training images in database.
H_height = train_imgs.shape[1] # H hereafter
W_width = train_imgs.shape[2] # W hereafter
C_classes = len(np.unique(train_lbls)) # C hereafter
13.2%
The autoreload extension is already loaded. To reload it, use: %reload_ext autoreload Downloading http://yann.lecun.com/exdb/mnist/train-images-idx3-ubyte.gz Downloading http://yann.lecun.com/exdb/mnist/train-images-idx3-ubyte.gz to ./data/mnist/MNIST/raw/train-images-idx3-ubyte.gz
100.0%
Extracting ./data/mnist/MNIST/raw/train-images-idx3-ubyte.gz to ./data/mnist/MNIST/raw Downloading http://yann.lecun.com/exdb/mnist/train-labels-idx1-ubyte.gz
100.0% 43.7%
Downloading http://yann.lecun.com/exdb/mnist/train-labels-idx1-ubyte.gz to ./data/mnist/MNIST/raw/train-labels-idx1-ubyte.gz Extracting ./data/mnist/MNIST/raw/train-labels-idx1-ubyte.gz to ./data/mnist/MNIST/raw Downloading http://yann.lecun.com/exdb/mnist/t10k-images-idx3-ubyte.gz Downloading http://yann.lecun.com/exdb/mnist/t10k-images-idx3-ubyte.gz to ./data/mnist/MNIST/raw/t10k-images-idx3-ubyte.gz
100.0% 100.0%
Extracting ./data/mnist/MNIST/raw/t10k-images-idx3-ubyte.gz to ./data/mnist/MNIST/raw Downloading http://yann.lecun.com/exdb/mnist/t10k-labels-idx1-ubyte.gz Downloading http://yann.lecun.com/exdb/mnist/t10k-labels-idx1-ubyte.gz to ./data/mnist/MNIST/raw/t10k-labels-idx1-ubyte.gz Extracting ./data/mnist/MNIST/raw/t10k-labels-idx1-ubyte.gz to ./data/mnist/MNIST/raw [train_imgs] Type: <class 'numpy.ndarray'> || Shape: (60000, 28, 28) || Data type: uint8 [train_lbls] Type: <class 'numpy.ndarray'> || Shape: (60000,) || Data type: int16 Class labels in train = [0 1 2 3 4 5 6 7 8 9] [test_imgs] Type: <class 'numpy.ndarray'> || Shape: (10000, 28, 28) || Data type: uint8 [test_lbls] Type: <class 'numpy.ndarray'> || Shape: (10000,) || Data type: int16 Class labels in test = [0 1 2 3 4 5 6 7 8 9]
Above we see that data have been loaded in numpy arrays.
Arrays with images have shape ( N = number of images, H = height, W = width ).
Arrays with labels have shape ( N = number of images), holding one integer per image, the digit's class.
MNIST comprises of a train set (N_tr = 60000) images and a test set (N_te = 10000) images.
We will use the train set for unsupervised learning. The test set will only be used for evaluating generalisation of classifiers towards the end of the tutorial.
Lets plot a few image in one collage to have a look...
%matplotlib inline
from utils.plotting import plot_grid_of_images # Helper functions, use out of the box.
plot_grid_of_images(train_imgs[0:100], n_imgs_per_row=10)
n_rows= 10
Notice that the intensities in the images take values from 0 to 255.
A first step in almost all pipelines is to pre-process the data, to make them more appropriate for a model.
Below, we will perform 3 points:
a) Change the labels from an integer representation to a one-hot representation of the C=10 classes.
b) Re-scale the intensities in the images, from the range [0,255], to be instead in the range [-1,+1].
c) Vectorise the 2D images into 1D vectors for the MLP, which only gets vectors as input.
# a) Change representation of labels to one-hot vectors of length C=10.
train_lbls_onehot = np.zeros(shape=(train_lbls.shape[0], C_classes ) )
train_lbls_onehot[ np.arange(train_lbls_onehot.shape[0]), train_lbls ] = 1
test_lbls_onehot = np.zeros(shape=(test_lbls.shape[0], C_classes ) )
test_lbls_onehot[ np.arange(test_lbls_onehot.shape[0]), test_lbls ] = 1
print("BEFORE: [train_lbls] Type: ", type(train_lbls), "|| Shape:", train_lbls.shape, " || Data type: ", train_lbls.dtype )
print("AFTER : [train_lbls_onehot] Type: ", type(train_lbls_onehot), "|| Shape:", train_lbls_onehot.shape, " || Data type: ", train_lbls_onehot.dtype )
BEFORE: [train_lbls] Type: <class 'numpy.ndarray'> || Shape: (60000,) || Data type: int16 AFTER : [train_lbls_onehot] Type: <class 'numpy.ndarray'> || Shape: (60000, 10) || Data type: float64
# b) Re-scale image intensities, from [0,255] to [-1, +1].
# This commonly facilitates learning:
# A zero-centered signal with small magnitude allows avoiding exploding/vanishing problems easier.
from utils.data_utils import normalize_int_whole_database # Helper function. Use out of the box.
train_imgs = normalize_int_whole_database(train_imgs, norm_type="minus_1_to_1")
test_imgs = normalize_int_whole_database(test_imgs, norm_type="minus_1_to_1")
# Lets plot one image.
from utils.plotting import plot_image, plot_images # Helper function, use out of the box.
index = 0 # Try any, up to 60000
print("Plotting image of index: [", index, "]")
print("Class label for this image is: ", train_lbls[index])
print("One-hot label representation: [", train_lbls_onehot[index], "]")
plot_image(train_imgs[index])
# Notice the magnitude of intensities. Black is now negative and white is positive float.
# Compare with intensities of figure further above.
Plotting image of index: [ 0 ] Class label for this image is: 5 One-hot label representation: [ [0. 0. 0. 0. 0. 1. 0. 0. 0. 0.] ]
# c) Flatten the images, from 2D matrices to 1D vectors. MLPs take feature-vectors as input, not 2D images.
train_imgs_flat = train_imgs.reshape([train_imgs.shape[0], -1]) # Preserve 1st dim (S = num Samples), flatten others.
test_imgs_flat = test_imgs.reshape([test_imgs.shape[0], -1])
print("Shape of numpy array holding the training database:")
print("Original : [N, H, W] = [", train_imgs.shape , "]")
print("Flattened: [N, H*W] = [", train_imgs_flat.shape , "]")
Shape of numpy array holding the training database: Original : [N, H, W] = [ (60000, 28, 28) ] Flattened: [N, H*W] = [ (60000, 784) ]
In this task, you are called to implement the architecture and losses of a Variational Auto-Encoder. Fill in the blanks where requested, to create the below architecture:

# -*- coding: utf-8 -*-
import torch
import torch.optim as optim
import torch.nn as nn
class Network():
def backward_pass(self, loss):
# Performs back propagation and computes gradients
# With PyTorch, we do not need to compute gradients analytically for parameters were requires_grads=True,
# Calling loss.backward(), torch's Autograd automatically computes grads of loss wrt each parameter p,...
# ... and **puts them in p.grad**. Return them in a list.
loss.backward()
grads = [param.grad for param in self.params]
return grads
class VAE(Network):
def __init__(self, rng, D_in, D_hid_enc, D_bottleneck, D_hid_dec):
# Construct and initialize network parameters
D_in = D_in # Dimension of input feature-vectors. Length of a vectorised image.
D_hid_1 = D_hid_enc # Dimension of Encoder's hidden layer
D_hid_2 = D_bottleneck
D_hid_3 = D_hid_dec # Dimension of Decoder's hidden layer
D_out = D_in # Dimension of Output layer.
self.D_bottleneck = D_bottleneck # Keep track of it, we will need it.
##### TODO: Initialize the VAE's parameters. Also see forward_pass(...)) #####################
# Dimensions of parameter tensors are (number of neurons + 1) per layer, to account for +1 bias.
# -- (Encoder) layer 1
w1_init = rng.normal(loc=0.0, scale=0.01, size=(D_in+1, D_hid_1))
# -- (Encoder) layer 2, predicting p(z|x)
w2_mu_init = rng.normal(loc=0.0, scale=0.01, size=(D_hid_1+1, D_hid_2)) # Weights for predicting means.
w2_std_init = rng.normal(loc=0.0, scale=0.01, size=(D_hid_1+1, D_hid_2)) # <----- weights for predicting std
# -- (Decoder) layer 3
w3_init = rng.normal(loc=0.0, scale=0.01, size=(D_hid_2+1, D_hid_3))
# -- (Decoder) layer 4, the output layer
w4_init = rng.normal(loc=0.0, scale=0.01, size=(D_hid_3+1, D_out))
# Pytorch tensors, parameters of the model
# Use the above numpy arrays as of random floats as initialization for the Pytorch weights.
# (Encoder)
w1 = torch.tensor(w1_init, dtype=torch.float, requires_grad=True)
# (Encoder) Layer 2, predicting p(z|x)
w2_mu = torch.tensor(w2_mu_init, dtype=torch.float, requires_grad=True) # <------- ?????
w2_std = torch.tensor(w2_std_init, dtype=torch.float, requires_grad=True)
# (Decoder)
w3 = torch.tensor(w3_init, dtype=torch.float, requires_grad=True)
w4 = torch.tensor(w4_init, dtype=torch.float, requires_grad=True)
#########################################################################################
# Keep track of all trainable parameters:
self.params = [w1, w2_mu, w2_std, w3, w4]
def encode(self, batch_imgs):
# batch_imgs: Numpy array or Pytorch tensor of shape: [number of inputs, dimensionality of x]
[w1, w2_mu, w2_std, w3, w4] = self.params
batch_imgs_t = torch.tensor(batch_imgs, dtype=torch.float) if type(batch_imgs) is np.ndarray else batch_imgs
unary_feature_for_bias = torch.ones(size=(batch_imgs_t.shape[0], 1)) # [N, 1] column vector.
x = torch.cat((batch_imgs_t, unary_feature_for_bias), dim=1) # Extra feature=1 for bias.
# ========== TODO: Fill in the gaps with the correct parameters of the VAE ========
# Encoder's Layer 1
h1_preact = x.mm(w1)
h1_act = h1_preact.clamp(min=0)
# Encoder's Layer 2 (predicting p(z|x) of Z coding):
h1_ext = torch.cat((h1_act, unary_feature_for_bias), dim=1)
# ... mu
h2_mu_preact = h1_ext.mm(w2_mu) # <-------------------------------
h2_mu_act = h2_mu_preact #h2_preact.clamp(min=0)
# ... log(std). Why do we do this, instead of directly predicting std deviation? See lecture slides.
h2_logstd_preact = h1_ext.mm(w2_std) # <------------------------
h2_logstd_act = h2_logstd_preact # No (linear) activation function in this tutorial, but can use any.
# ==============================================================================
z_coding = (h2_mu_act, h2_logstd_act)
return z_coding
def decode(self, z_codes):
# z_codes: numpy array or pytorch tensor, shape [N, dimensionality of Z]
[w1, w2_mu, w2_std, w3, w4] = self.params
z_codes_t = torch.tensor(z_codes, dtype=torch.float) if type(z_codes) is np.ndarray else z_codes
unary_feature_for_bias = torch.ones(size=(z_codes_t.shape[0], 1)) # [N, 1] column vector.
# ========== TODO: Fill in the gaps with the correct parameters of the VAE ========
# Decoder's 1st layer (Layer 3 of whole VAE):
h2_ext = torch.cat((z_codes_t, unary_feature_for_bias), dim=1)
h3_preact = h2_ext.mm(w3) # <------------------------
h3_act = h3_preact.clamp(min=0)
# Decoder's 2nd layer (Layer 4 of whole VAE): The output layer.
h3_ext = torch.cat((h3_act, unary_feature_for_bias), dim=1)
h4_preact = h3_ext.mm(w4)
h4_act = torch.tanh(h4_preact)
# ==============================================================================
# Output
x_pred = h4_act
return x_pred
def sample_with_reparameterization(self, z_mu, z_logstd):
# Reparameterization trick to sample from N(mu, var) using N(0,1) as intermediate step.
# param z_mu: Tensor. Mean of the predicted Gaussian p(z|x). Shape: [Num samples, Dimensionality of Z]
# param z_logstd: Tensor. Log of standard deviation of predicted Gaussian p(z|x). [Num samples, Dim of Z]
# return: Tensor. [Num samples, Dim of Z]
N_samples = z_mu.shape[0]
Z_dims = z_mu.shape[1]
# ========== TODO: Fill in the gaps to complete the reparameterization trick ========
z_std = torch.exp(z_logstd) # <------------------- compute std from log(std)
eps = torch.randn(size=[N_samples, Z_dims]) # torch.randn_like(std)
z_samples = eps * z_std + z_mu # <---------------- Re-parameterization trick
# ==============================================================================
return z_samples
def forward_pass(self, batch_imgs):
# Performed at every batch during training.
# Takes an input batch, encodes it, samples a code from p(z|x) with reparameterization, decodes it.
# Returns: Reconstruction x_pred, predicted means z_mu, predicted log(std) z_logstd, sampled codes z_samples.
batch_imgs_t = torch.tensor(batch_imgs, dtype=torch.float) # Makes numpy array to pytorch tensor.
# ========== TODO: Call the appropriate functions, as you defined them above ========
# Encoder
z_mu, z_logstd = self.encode(batch_imgs_t) # <----------------------- ????????????
z_samples = self.sample_with_reparameterization(z_mu, z_logstd) # <------------- ????????????
# Decoder
x_pred = self.decode(z_samples) # <------------- ????????????
# ===================================================================================
return (x_pred, z_mu, z_logstd, z_samples)
def reconstruction_loss(x_pred, x_real, eps=1e-7):
# x_pred: [N, D_out] Prediction returned by forward_pass. Numpy array of shape [N, D_out]
# x_real: [N, D_in]
# If number array is given, change it to a Torch tensor.
x_pred = torch.tensor(x_pred, dtype=torch.float) if type(x_pred) is np.ndarray else x_pred
x_real = torch.tensor(x_real, dtype=torch.float) if type(x_real) is np.ndarray else x_real
######## TODO: Complete the calculation of Reconstruction loss for each sample ###########
loss_recon = torch.mean(torch.square(x_pred - x_real), dim=1) # <---------- same as for AEs
##########################################################################################
cost = torch.mean(loss_recon, dim=0) # Expectation of loss: Mean over samples (axis=0).
return cost
def regularizer_loss(mu, log_std):
# mu: Tensor, [number of samples, dimensionality of Z]. Predicted means per z dimension
# log_std: Tensor, [number of samples, dimensionality of Z]. Predicted log(std.dev.) per z dimension.
######## TODO: Complete the calculation of the Regularizer for each sample ###########
std = torch.exp(log_std) # Compute std.dev. from log(std.dev.)
reg_loss_per_sample = 0.5 * torch.sum(mu**2 + std**2 - 1 - 2 * log_std, dim = 1) # <------ See lecture slides
reg_loss = torch.mean(reg_loss_per_sample, dim = 0) # Mean over samples.
##########################################################################################
return reg_loss
def vae_loss(x_real, x_pred, z_mu, z_logstd, lambda_rec=1., lambda_reg=0.005, eps=1e-7):
rec_loss = reconstruction_loss(x_pred, x_real, eps=1e-7)
reg_loss = regularizer_loss(z_mu, z_logstd)
################### TODO: compute the total loss: #####################################
# ...by weighting the reconstruction loss by lambda_rec, and the Regularizer by lambda_reg
weighted_rec_loss = lambda_rec * rec_loss
weighted_reg_loss = lambda_reg * reg_loss
total_loss = weighted_rec_loss + weighted_reg_loss
#######################################################################################
return total_loss, weighted_rec_loss, weighted_reg_loss
If this task is completed correctly, you should be able to run the cell and get no errors. Though no output will be given yet. We will use this in the next task, and then we will find out if everything went well :-)
Below you are given the main training function, which performs gradient descent in unsupervised fashion.
In the below code, a random batch of images is given to the VAE for a forward_pass (encode, sampling via reparameterization trick, decode). Then, it returns the reconstruction of the sample (x_pred), the predicted mean and logarithm(of standard deviation) of the distribution p(z|x) of codes z for the code of sample x. It also returns the code z passed to the decoder, which here is a sample from the predicted p(z|x) for each sample.
Then, the total loss of the VAE is calculated via vae_loss(), implemented above, and minimized via Adam.
Fill in the 2 blanks in the code, to simply pass the correct parameters (predicted means (mu) and log(std.dev)) to the loss function (vae_loss()), so that it can get optimized.
from utils.plotting import plot_train_progress_VAE, plot_grids_of_images # Use out of the box
def get_random_batch(train_imgs, train_lbls, batch_size, rng):
# train_imgs: Images. Numpy array of shape [N, H * W]
# train_lbls: Labels of images. None, or Numpy array of shape [N, C_classes], one hot label for each image.
# batch_size: integer. Size that the batch should have.
indices = range(0, batch_size) # Remove this line after you fill-in and un-comment the below.
indices = rng.randint(low=0, high=train_imgs.shape[0], size=batch_size, dtype='int32')
train_imgs_batch = train_imgs[indices]
if train_lbls is not None: # Enables function to be used both for supervised and unsupervised learning
train_lbls_batch = train_lbls[indices]
else:
train_lbls_batch = None
return [train_imgs_batch, train_lbls_batch]
def unsupervised_training_VAE(net,
loss_func,
lambda_rec,
lambda_reg,
rng,
train_imgs_all,
batch_size,
learning_rate,
total_iters,
iters_per_recon_plot=-1):
# net: Instance of a model. See classes: Autoencoder, MLPClassifier, etc further below
# loss_func: Function that computes the loss. See functions: reconstruction_loss or cross_entropy.
# lambda_rec: weighing of reconstruction loss in total loss. Total = lambda_rec * rec_loss + lambda_reg * reg_loss
# lambda_reg: same as above, but for regularizer
# rng: numpy random number generator
# train_imgs_all: All the training images. Numpy array, shape [N_tr, H, W]
# batch_size: Size of the batch that should be processed per SGD iteration by a model.
# learning_rate: self explanatory.
# total_iters: how many SGD iterations to perform.
# iters_per_recon_plot: Integer. Every that many iterations the model predicts training images ...
# ...and we plot their reconstruction. For visual observation of the results.
loss_total_to_plot = []
loss_rec_to_plot = []
loss_reg_to_plot = []
optimizer = optim.Adam(net.params, lr=learning_rate) # Will use PyTorch's Adam optimizer out of the box
for t in range(total_iters):
# Sample batch for this SGD iteration
x_batch, _ = get_random_batch(train_imgs_all, None, batch_size, rng)
################### TODO: compute the total loss: ################################################
# Pass parameters of the predicted distribution per x (mean mu and log(std.dev) to the loss function
# Forward pass: Encodes, samples via reparameterization trick, decodes
x_pred, z_mu, z_logstd, z_codes = net.forward_pass(x_batch)
# Compute loss:
total_loss, rec_loss, reg_loss = loss_func(x_batch, x_pred, z_mu, z_logstd, lambda_rec, lambda_reg) # <-------------
####################################################################################################
# Pytorch way
optimizer.zero_grad()
_ = net.backward_pass(total_loss)
optimizer.step()
# ==== Report training loss and accuracy ======
total_loss_np = total_loss if type(total_loss) is type(float) else total_loss.item() # Pytorch returns tensor. Cast to float
rec_loss_np = rec_loss if type(rec_loss) is type(float) else rec_loss.item()
reg_loss_np = reg_loss if type(reg_loss) is type(float) else reg_loss.item()
if t%10==0: # Print every 10 iterations
print("[iter:", t, "]: Total training Loss: {0:.2f}".format(total_loss_np))
loss_total_to_plot.append(total_loss_np)
loss_rec_to_plot.append(rec_loss_np)
loss_reg_to_plot.append(reg_loss_np)
# Every few iterations, show reconstructions
if t==total_iters-1 or t%iters_per_recon_plot == 0:
# Reconstruct all images, to plot reconstructions.
x_pred_all, z_mu_all, z_logstd_all, z_codes_all = net.forward_pass(train_imgs_all)
# Cast tensors to numpy arrays
x_pred_all_np = x_pred_all if type(x_pred_all) is np.ndarray else x_pred_all.detach().numpy()
# Predicted reconstructions have vector shape. Reshape them to original image shape.
train_imgs_resh = train_imgs_all.reshape([train_imgs_all.shape[0], H_height, W_width])
x_pred_all_np_resh = x_pred_all_np.reshape([train_imgs_all.shape[0], H_height, W_width])
# Plot a few images, originals and predicted reconstructions.
plot_grids_of_images([train_imgs_resh[0:100], x_pred_all_np_resh[0:100]],
titles=["Real", "Reconstructions"],
n_imgs_per_row=10,
dynamically=True)
# In the end of the process, plot loss.
plot_train_progress_VAE(loss_total_to_plot, loss_rec_to_plot, loss_reg_to_plot, iters_per_point=1, y_lims=[1., 1., None])
If you completed the above correctly you should get no error message here. Finally, lets use the above and implementation of VAE in Task 1, to train a VAE!
Fill in the below gap, to make the VAE shown in figure of Task1 with a 2-dimensional Z representation...
##################### TODO: Fill in the blank ##############################
# Create the network
rng = np.random.RandomState(seed=SEED)
vae = VAE(rng=rng,
D_in=H_height*W_width,
D_hid_enc=256,
D_bottleneck=2, # <--- Set to correct value for instantiating VAE shown & implemented in Task 1. Note: We treat D as dimensionality of Z, rather than number of neurons.
D_hid_dec=256)
########################################################################
# Start training
unsupervised_training_VAE(vae,
vae_loss,
lambda_rec=1.0, # <-------- lambda_rec, weight on reconstruction loss.
lambda_reg=0.005, # <------- lambda_reg, weight on regularizer. 0.005 works ok.
rng=rng,
train_imgs_all=train_imgs_flat,
batch_size=40,
learning_rate=3e-3,
total_iters=1000,
iters_per_recon_plot=50)
[iter: 0 ]: Total training Loss: 0.93
[iter: 10 ]: Total training Loss: 0.55 [iter: 20 ]: Total training Loss: 0.28 [iter: 30 ]: Total training Loss: 0.31 [iter: 40 ]: Total training Loss: 0.27 [iter: 50 ]: Total training Loss: 0.28
[iter: 60 ]: Total training Loss: 0.28 [iter: 70 ]: Total training Loss: 0.27 [iter: 80 ]: Total training Loss: 0.26 [iter: 90 ]: Total training Loss: 0.27 [iter: 100 ]: Total training Loss: 0.26
[iter: 110 ]: Total training Loss: 0.24 [iter: 120 ]: Total training Loss: 0.26 [iter: 130 ]: Total training Loss: 0.26 [iter: 140 ]: Total training Loss: 0.25 [iter: 150 ]: Total training Loss: 0.23
[iter: 160 ]: Total training Loss: 0.24 [iter: 170 ]: Total training Loss: 0.23 [iter: 180 ]: Total training Loss: 0.24 [iter: 190 ]: Total training Loss: 0.23 [iter: 200 ]: Total training Loss: 0.28
[iter: 210 ]: Total training Loss: 0.27 [iter: 220 ]: Total training Loss: 0.25 [iter: 230 ]: Total training Loss: 0.23 [iter: 240 ]: Total training Loss: 0.24 [iter: 250 ]: Total training Loss: 0.23
[iter: 260 ]: Total training Loss: 0.25 [iter: 270 ]: Total training Loss: 0.26 [iter: 280 ]: Total training Loss: 0.24 [iter: 290 ]: Total training Loss: 0.24 [iter: 300 ]: Total training Loss: 0.23
[iter: 310 ]: Total training Loss: 0.24 [iter: 320 ]: Total training Loss: 0.27 [iter: 330 ]: Total training Loss: 0.24 [iter: 340 ]: Total training Loss: 0.24 [iter: 350 ]: Total training Loss: 0.23
[iter: 360 ]: Total training Loss: 0.25 [iter: 370 ]: Total training Loss: 0.25 [iter: 380 ]: Total training Loss: 0.24 [iter: 390 ]: Total training Loss: 0.24 [iter: 400 ]: Total training Loss: 0.25
[iter: 410 ]: Total training Loss: 0.25 [iter: 420 ]: Total training Loss: 0.22 [iter: 430 ]: Total training Loss: 0.25 [iter: 440 ]: Total training Loss: 0.24 [iter: 450 ]: Total training Loss: 0.25
[iter: 460 ]: Total training Loss: 0.23 [iter: 470 ]: Total training Loss: 0.23 [iter: 480 ]: Total training Loss: 0.23 [iter: 490 ]: Total training Loss: 0.23 [iter: 500 ]: Total training Loss: 0.23
[iter: 510 ]: Total training Loss: 0.23 [iter: 520 ]: Total training Loss: 0.25 [iter: 530 ]: Total training Loss: 0.24 [iter: 540 ]: Total training Loss: 0.24 [iter: 550 ]: Total training Loss: 0.25
[iter: 560 ]: Total training Loss: 0.22 [iter: 570 ]: Total training Loss: 0.23 [iter: 580 ]: Total training Loss: 0.24 [iter: 590 ]: Total training Loss: 0.25 [iter: 600 ]: Total training Loss: 0.21
[iter: 610 ]: Total training Loss: 0.25 [iter: 620 ]: Total training Loss: 0.23 [iter: 630 ]: Total training Loss: 0.24 [iter: 640 ]: Total training Loss: 0.22 [iter: 650 ]: Total training Loss: 0.23
[iter: 660 ]: Total training Loss: 0.24 [iter: 670 ]: Total training Loss: 0.25 [iter: 680 ]: Total training Loss: 0.20 [iter: 690 ]: Total training Loss: 0.24 [iter: 700 ]: Total training Loss: 0.23
[iter: 710 ]: Total training Loss: 0.25 [iter: 720 ]: Total training Loss: 0.23 [iter: 730 ]: Total training Loss: 0.21 [iter: 740 ]: Total training Loss: 0.25 [iter: 750 ]: Total training Loss: 0.23
[iter: 760 ]: Total training Loss: 0.21 [iter: 770 ]: Total training Loss: 0.24 [iter: 780 ]: Total training Loss: 0.22 [iter: 790 ]: Total training Loss: 0.23 [iter: 800 ]: Total training Loss: 0.23
[iter: 810 ]: Total training Loss: 0.24 [iter: 820 ]: Total training Loss: 0.23 [iter: 830 ]: Total training Loss: 0.22 [iter: 840 ]: Total training Loss: 0.24 [iter: 850 ]: Total training Loss: 0.23
[iter: 860 ]: Total training Loss: 0.24 [iter: 870 ]: Total training Loss: 0.22 [iter: 880 ]: Total training Loss: 0.22 [iter: 890 ]: Total training Loss: 0.24 [iter: 900 ]: Total training Loss: 0.22
[iter: 910 ]: Total training Loss: 0.22 [iter: 920 ]: Total training Loss: 0.23 [iter: 930 ]: Total training Loss: 0.25 [iter: 940 ]: Total training Loss: 0.22 [iter: 950 ]: Total training Loss: 0.22
[iter: 960 ]: Total training Loss: 0.22 [iter: 970 ]: Total training Loss: 0.23 [iter: 980 ]: Total training Loss: 0.22 [iter: 990 ]: Total training Loss: 0.22
/home/dg22309/Documents/Compass First Year/Compass/SC2/lec8_DL/VAEs/utils/plotting.py:55: RuntimeWarning: More than 20 figures have been opened. Figures created through the pyplot interface (`matplotlib.pyplot.figure`) are retained until explicitly closed and may consume too much memory. (To control this warning, see the rcParam `figure.max_open_warning`). Consider using `matplotlib.pyplot.close()`. fig, axes = plt.subplots(1, n_images, sharex=False, sharey=False)
The above requires you to have completed both Task 1 and Task 2. If everything is completed correctly, you should see the model getting trained and the total training loss printed every few iterations.
In the end of training, after 1000 iterations, you will see 3 curve, one for the TOTAL training loss, one for the weighted reconstruction loss, and one for the weighted regularization loss (weighted = after multiplication with the weights lambda_rec and lambda_reg respectively, when computing the total loss. If everything is done well, the total and reconstruction loss are expected to decrease down to approximately 0.25, and the regularizer down to approx 0.01-0.02.
You should also see printed side by side a set of real images, and their reconstructed version.
In the end, the reconstructions should start being reasonable.
We now have a trained VAE with 2-dimensional representation Z from Task 2. We will here use it to encode training data and obtain the means and standard deviations of the predicted distributions for the codes p(z|x). We will then plot the predicted means for p(z|x) for each sample x, in a 2D plot, to observe how codes are clustered.
Note: Fill in the 1 blank below, run the code, and observe output...
import matplotlib.pyplot as plt
def encode_training_images(net,
imgs_flat,
lbls,
batch_size,
total_iterations=None,
plot_2d_embedding=True,
plot_hist_mu_std_for_dim=0):
# This function encodes images, plots the first 2 dimensions of the codes in a plot, and finally...
# ... returns the minimum and maximum values of the codes for each dimensions of Z.
# ... We will use this at a layer task.
# Arguments:
# imgs_flat: Numpy array of shape [Number of images, H * W]
# lbls: Numpy array of shape [number of images], with 1 integer per image. The integer is the class (digit).
# total_iterations: How many batches to encode. We will use this so that we dont encode and plot ...
# ... the whoooole training database, because the plot will get cluttered with 60000 points.
# If total iterations is None, the function will just iterate over all data, by breaking them into batches.
if total_iterations is None:
total_iterations = (train_imgs_flat.shape[0] - 1) // batch_size + 1
z_mu_all = []
z_std_all = []
lbls_all = []
for t in range(total_iterations):
# Sample batch for this SGD iteration
x_batch = imgs_flat[t*batch_size: (t+1)*batch_size]
lbls_batch = lbls[t*batch_size: (t+1)*batch_size] # Just to color the embeddings (z codes) in the plot.
####### TODO: Fill in the blank ##################################
# Encode a batch of x inputs:
z_mu, z_logstd = net.encode(x_batch) # <------------------------
#################################################################
z_mu_np = z_mu if type(z_mu) is np.ndarray else z_mu.detach().numpy()
z_logstd_np = z_logstd if type(z_logstd) is np.ndarray else z_logstd.detach().numpy()
z_mu_all.append(z_mu_np)
z_std_all.append(np.exp(z_logstd_np))
lbls_all.append(lbls_batch)
z_mu_all = np.concatenate(z_mu_all) # Make list of arrays in one array by concatenating along dim=0 (image index)
z_std_all = np.concatenate(z_std_all)
lbls_all = np.concatenate(lbls_all)
if plot_2d_embedding:
print("Z-Space and the MEAN of the predicted p(z|x) for each sample (std.devs not shown)")
# Plot the codes with different color per class in a scatter plot:
plt.scatter(z_mu_all[:,0], z_mu_all[:,1], c=lbls_all, alpha=0.5) # Plot the first 2 dimensions.
plt.show()
print("Histogram of values of the predicted MEANS")
plt.hist(z_mu_all[:,plot_hist_mu_std_for_dim], bins=20)
plt.show()
print("Histogram of values of the predicted STANDARD DEVIATIONS")
plt.hist(z_std_all[:,plot_hist_mu_std_for_dim], bins=20)
plt.show()
# Encode and plot
encode_training_images(vae,
train_imgs_flat,
train_lbls,
batch_size=100,
total_iterations=200,
plot_2d_embedding=True,
plot_hist_mu_std_for_dim=1)
Z-Space and the MEAN of the predicted p(z|x) for each sample (std.devs not shown)
Histogram of values of the predicted MEANS
Histogram of values of the predicted STANDARD DEVIATIONS
If all went well, you should see 3 plots:
Questions:
The code below is complete. Just run it and compare results with those of previous Tasks, where the VAE was trained both with a reconstruction and the regularizer.
# Create the network
rng = np.random.RandomState(seed=SEED)
vae_2 = VAE(rng=rng,
D_in=H_height*W_width,
D_hid_enc=256,
D_bottleneck=2,
D_hid_dec=256)
# Start training
unsupervised_training_VAE(vae_2,
vae_loss,
lambda_rec=1.0,
lambda_reg=0.0, # Essentially not minimizing regularizer. Only reconstruction.
rng=rng,
train_imgs_all=train_imgs_flat,
batch_size=40,
learning_rate=3e-3,
total_iters=1000,
iters_per_recon_plot=50)
[iter: 0 ]: Total training Loss: 0.93
[iter: 10 ]: Total training Loss: 0.56 [iter: 20 ]: Total training Loss: 0.51 [iter: 30 ]: Total training Loss: 0.60 [iter: 40 ]: Total training Loss: 0.56 [iter: 50 ]: Total training Loss: 0.58
[iter: 60 ]: Total training Loss: 0.57 [iter: 70 ]: Total training Loss: 0.59 [iter: 80 ]: Total training Loss: 0.57 [iter: 90 ]: Total training Loss: 0.57 [iter: 100 ]: Total training Loss: 0.58
[iter: 110 ]: Total training Loss: 0.56 [iter: 120 ]: Total training Loss: 0.59 [iter: 130 ]: Total training Loss: 0.55 [iter: 140 ]: Total training Loss: 0.57 [iter: 150 ]: Total training Loss: 0.55
[iter: 160 ]: Total training Loss: 0.55 [iter: 170 ]: Total training Loss: 0.55 [iter: 180 ]: Total training Loss: 0.57 [iter: 190 ]: Total training Loss: 0.56 [iter: 200 ]: Total training Loss: 0.62
[iter: 210 ]: Total training Loss: 0.60 [iter: 220 ]: Total training Loss: 0.57 [iter: 230 ]: Total training Loss: 0.52 [iter: 240 ]: Total training Loss: 0.57 [iter: 250 ]: Total training Loss: 0.57
[iter: 260 ]: Total training Loss: 0.59 [iter: 270 ]: Total training Loss: 0.59 [iter: 280 ]: Total training Loss: 0.57 [iter: 290 ]: Total training Loss: 0.54 [iter: 300 ]: Total training Loss: 0.57
[iter: 310 ]: Total training Loss: 0.59 [iter: 320 ]: Total training Loss: 0.61 [iter: 330 ]: Total training Loss: 0.58 [iter: 340 ]: Total training Loss: 0.58 [iter: 350 ]: Total training Loss: 0.56
[iter: 360 ]: Total training Loss: 0.60 [iter: 370 ]: Total training Loss: 0.60 [iter: 380 ]: Total training Loss: 0.57 [iter: 390 ]: Total training Loss: 0.59 [iter: 400 ]: Total training Loss: 0.55
[iter: 410 ]: Total training Loss: 0.58 [iter: 420 ]: Total training Loss: 0.55 [iter: 430 ]: Total training Loss: 0.59 [iter: 440 ]: Total training Loss: 0.57 [iter: 450 ]: Total training Loss: 0.58
[iter: 460 ]: Total training Loss: 0.56 [iter: 470 ]: Total training Loss: 0.55 [iter: 480 ]: Total training Loss: 0.55 [iter: 490 ]: Total training Loss: 0.57 [iter: 500 ]: Total training Loss: 0.57
[iter: 510 ]: Total training Loss: 0.58 [iter: 520 ]: Total training Loss: 0.60 [iter: 530 ]: Total training Loss: 0.56 [iter: 540 ]: Total training Loss: 0.56 [iter: 550 ]: Total training Loss: 0.60
[iter: 560 ]: Total training Loss: 0.54 [iter: 570 ]: Total training Loss: 0.55 [iter: 580 ]: Total training Loss: 0.57 [iter: 590 ]: Total training Loss: 0.58 [iter: 600 ]: Total training Loss: 0.55
[iter: 610 ]: Total training Loss: 0.60 [iter: 620 ]: Total training Loss: 0.58 [iter: 630 ]: Total training Loss: 0.58 [iter: 640 ]: Total training Loss: 0.57 [iter: 650 ]: Total training Loss: 0.60
[iter: 660 ]: Total training Loss: 0.58 [iter: 670 ]: Total training Loss: 0.59 [iter: 680 ]: Total training Loss: 0.55 [iter: 690 ]: Total training Loss: 0.59 [iter: 700 ]: Total training Loss: 0.56
[iter: 710 ]: Total training Loss: 0.58 [iter: 720 ]: Total training Loss: 0.56 [iter: 730 ]: Total training Loss: 0.55 [iter: 740 ]: Total training Loss: 0.56 [iter: 750 ]: Total training Loss: 0.56
[iter: 760 ]: Total training Loss: 0.57 [iter: 770 ]: Total training Loss: 0.58 [iter: 780 ]: Total training Loss: 0.57 [iter: 790 ]: Total training Loss: 0.59 [iter: 800 ]: Total training Loss: 0.60
[iter: 810 ]: Total training Loss: 0.56 [iter: 820 ]: Total training Loss: 0.57 [iter: 830 ]: Total training Loss: 0.53 [iter: 840 ]: Total training Loss: 0.59 [iter: 850 ]: Total training Loss: 0.58
[iter: 860 ]: Total training Loss: 0.59 [iter: 870 ]: Total training Loss: 0.57 [iter: 880 ]: Total training Loss: 0.57 [iter: 890 ]: Total training Loss: 0.56 [iter: 900 ]: Total training Loss: 0.57
[iter: 910 ]: Total training Loss: 0.53 [iter: 920 ]: Total training Loss: 0.55 [iter: 930 ]: Total training Loss: 0.59 [iter: 940 ]: Total training Loss: 0.60 [iter: 950 ]: Total training Loss: 0.57
[iter: 960 ]: Total training Loss: 0.57 [iter: 970 ]: Total training Loss: 0.57 [iter: 980 ]: Total training Loss: 0.59 [iter: 990 ]: Total training Loss: 0.55
Questions:
# Encode and plot
encode_training_images(vae_2, # The second VAE, trained only with Reconstruction loss.
train_imgs_flat,
train_lbls,
batch_size=100,
total_iterations=200,
plot_2d_embedding=True,
plot_hist_mu_std_for_dim=1)
Z-Space and the MEAN of the predicted p(z|x) for each sample (std.devs not shown)
Histogram of values of the predicted MEANS
Histogram of values of the predicted STANDARD DEVIATIONS
Questions:
The code below is complete. Just run it and compare results with those of previous Tasks 2,3,4.
# Create the network
rng = np.random.RandomState(seed=SEED)
vae_3 = VAE(rng=rng,
D_in=H_height*W_width,
D_hid_enc=256,
D_bottleneck=2,
D_hid_dec=256)
# Start training
unsupervised_training_VAE(vae_3,
vae_loss,
lambda_rec=0.0, # <------- No reconstruction loss. Only regularizer
lambda_reg=0.005,
rng=rng,
train_imgs_all=train_imgs_flat,
batch_size=40,
learning_rate=3e-3,
total_iters=1000,
iters_per_recon_plot=50)
[iter: 0 ]: Total training Loss: 0.00
[iter: 10 ]: Total training Loss: 0.00 [iter: 20 ]: Total training Loss: 0.00 [iter: 30 ]: Total training Loss: 0.00 [iter: 40 ]: Total training Loss: 0.00 [iter: 50 ]: Total training Loss: 0.00
[iter: 60 ]: Total training Loss: 0.00 [iter: 70 ]: Total training Loss: 0.00 [iter: 80 ]: Total training Loss: 0.00 [iter: 90 ]: Total training Loss: 0.00 [iter: 100 ]: Total training Loss: 0.00
[iter: 110 ]: Total training Loss: 0.00 [iter: 120 ]: Total training Loss: 0.00 [iter: 130 ]: Total training Loss: 0.00 [iter: 140 ]: Total training Loss: 0.00 [iter: 150 ]: Total training Loss: 0.00
[iter: 160 ]: Total training Loss: 0.00 [iter: 170 ]: Total training Loss: 0.00 [iter: 180 ]: Total training Loss: 0.00 [iter: 190 ]: Total training Loss: 0.00 [iter: 200 ]: Total training Loss: 0.00
[iter: 210 ]: Total training Loss: 0.00 [iter: 220 ]: Total training Loss: 0.00 [iter: 230 ]: Total training Loss: 0.00 [iter: 240 ]: Total training Loss: 0.00 [iter: 250 ]: Total training Loss: 0.00
[iter: 260 ]: Total training Loss: 0.00 [iter: 270 ]: Total training Loss: 0.00 [iter: 280 ]: Total training Loss: 0.00 [iter: 290 ]: Total training Loss: 0.00 [iter: 300 ]: Total training Loss: 0.00
[iter: 310 ]: Total training Loss: 0.00 [iter: 320 ]: Total training Loss: 0.00 [iter: 330 ]: Total training Loss: 0.00 [iter: 340 ]: Total training Loss: 0.00 [iter: 350 ]: Total training Loss: 0.00
[iter: 360 ]: Total training Loss: 0.00 [iter: 370 ]: Total training Loss: 0.00 [iter: 380 ]: Total training Loss: 0.00 [iter: 390 ]: Total training Loss: 0.00 [iter: 400 ]: Total training Loss: 0.00
[iter: 410 ]: Total training Loss: 0.00 [iter: 420 ]: Total training Loss: 0.00 [iter: 430 ]: Total training Loss: 0.00 [iter: 440 ]: Total training Loss: 0.00 [iter: 450 ]: Total training Loss: 0.00
[iter: 460 ]: Total training Loss: 0.00 [iter: 470 ]: Total training Loss: 0.00 [iter: 480 ]: Total training Loss: 0.00 [iter: 490 ]: Total training Loss: 0.00 [iter: 500 ]: Total training Loss: 0.00
[iter: 510 ]: Total training Loss: 0.00 [iter: 520 ]: Total training Loss: 0.00 [iter: 530 ]: Total training Loss: 0.00 [iter: 540 ]: Total training Loss: 0.00 [iter: 550 ]: Total training Loss: 0.00
[iter: 560 ]: Total training Loss: 0.00 [iter: 570 ]: Total training Loss: 0.00 [iter: 580 ]: Total training Loss: 0.00 [iter: 590 ]: Total training Loss: 0.00 [iter: 600 ]: Total training Loss: 0.00
[iter: 610 ]: Total training Loss: 0.00 [iter: 620 ]: Total training Loss: 0.00 [iter: 630 ]: Total training Loss: 0.00 [iter: 640 ]: Total training Loss: 0.00 [iter: 650 ]: Total training Loss: 0.00
[iter: 660 ]: Total training Loss: 0.00 [iter: 670 ]: Total training Loss: 0.00 [iter: 680 ]: Total training Loss: 0.00 [iter: 690 ]: Total training Loss: 0.00 [iter: 700 ]: Total training Loss: 0.00
[iter: 710 ]: Total training Loss: 0.00 [iter: 720 ]: Total training Loss: 0.00 [iter: 730 ]: Total training Loss: 0.00 [iter: 740 ]: Total training Loss: 0.00 [iter: 750 ]: Total training Loss: 0.00
[iter: 760 ]: Total training Loss: 0.00 [iter: 770 ]: Total training Loss: 0.00 [iter: 780 ]: Total training Loss: 0.00 [iter: 790 ]: Total training Loss: 0.00 [iter: 800 ]: Total training Loss: 0.00
[iter: 810 ]: Total training Loss: 0.00 [iter: 820 ]: Total training Loss: 0.00 [iter: 830 ]: Total training Loss: 0.00 [iter: 840 ]: Total training Loss: 0.00 [iter: 850 ]: Total training Loss: 0.00
[iter: 860 ]: Total training Loss: 0.00 [iter: 870 ]: Total training Loss: 0.00 [iter: 880 ]: Total training Loss: 0.00 [iter: 890 ]: Total training Loss: 0.00 [iter: 900 ]: Total training Loss: 0.00
[iter: 910 ]: Total training Loss: 0.00 [iter: 920 ]: Total training Loss: 0.00 [iter: 930 ]: Total training Loss: 0.00 [iter: 940 ]: Total training Loss: 0.00 [iter: 950 ]: Total training Loss: 0.00
[iter: 960 ]: Total training Loss: 0.00 [iter: 970 ]: Total training Loss: 0.00 [iter: 980 ]: Total training Loss: 0.00 [iter: 990 ]: Total training Loss: 0.00
Questions:
- How good are reconstructions? Why?
# Encode and plot
encode_training_images(vae_3, # The second VAE, trained only with Reconstruction loss.
train_imgs_flat,
train_lbls,
batch_size=100,
total_iterations=200,
plot_2d_embedding=True,
plot_hist_mu_std_for_dim=1)
Z-Space and the MEAN of the predicted p(z|x) for each sample (std.devs not shown)
Histogram of values of the predicted MEANS
Histogram of values of the predicted STANDARD DEVIATIONS
Questions:
Below, we train a VAE with a bottleneck layer of 32 dimensions (1 mu and std.dev predicted for each), and train it appropriately, both with the reconstruction and the regularizer.
The code is complete. Just run it and observe the results.
# Same as in Task 2, but using a bottle neck with 32 dimension
# Create the network
rng = np.random.RandomState(seed=SEED)
vae_wide = VAE(rng=rng,
D_in=H_height*W_width,
D_hid_enc=256,
D_bottleneck=32, # <-----------------------------------
D_hid_dec=256)
# Start training
unsupervised_training_VAE(vae_wide,
vae_loss,
1.0, # alpha on the recon loss.
0.005, # 0.005 works well for synthesis! 0.0005 better for smooth z values for 32n.
rng,
train_imgs_flat,
batch_size=40,
learning_rate=3e-3, # 3e-3
total_iters=1000,
iters_per_recon_plot=50)
[iter: 0 ]: Total training Loss: 0.93
[iter: 10 ]: Total training Loss: 0.43 [iter: 20 ]: Total training Loss: 0.33 [iter: 30 ]: Total training Loss: 0.31 [iter: 40 ]: Total training Loss: 0.27 [iter: 50 ]: Total training Loss: 0.28
[iter: 60 ]: Total training Loss: 0.28 [iter: 70 ]: Total training Loss: 0.25 [iter: 80 ]: Total training Loss: 0.27 [iter: 90 ]: Total training Loss: 0.26 [iter: 100 ]: Total training Loss: 0.25
[iter: 110 ]: Total training Loss: 0.25 [iter: 120 ]: Total training Loss: 0.27 [iter: 130 ]: Total training Loss: 0.28 [iter: 140 ]: Total training Loss: 0.24 [iter: 150 ]: Total training Loss: 0.25
[iter: 160 ]: Total training Loss: 0.26 [iter: 170 ]: Total training Loss: 0.25 [iter: 180 ]: Total training Loss: 0.27 [iter: 190 ]: Total training Loss: 0.28 [iter: 200 ]: Total training Loss: 0.25
[iter: 210 ]: Total training Loss: 0.23 [iter: 220 ]: Total training Loss: 0.26 [iter: 230 ]: Total training Loss: 0.24 [iter: 240 ]: Total training Loss: 0.25 [iter: 250 ]: Total training Loss: 0.26
[iter: 260 ]: Total training Loss: 0.25 [iter: 270 ]: Total training Loss: 0.25 [iter: 280 ]: Total training Loss: 0.24 [iter: 290 ]: Total training Loss: 0.26 [iter: 300 ]: Total training Loss: 0.25
[iter: 310 ]: Total training Loss: 0.25 [iter: 320 ]: Total training Loss: 0.27 [iter: 330 ]: Total training Loss: 0.25 [iter: 340 ]: Total training Loss: 0.24 [iter: 350 ]: Total training Loss: 0.26
[iter: 360 ]: Total training Loss: 0.25 [iter: 370 ]: Total training Loss: 0.23 [iter: 380 ]: Total training Loss: 0.24 [iter: 390 ]: Total training Loss: 0.26 [iter: 400 ]: Total training Loss: 0.23
[iter: 410 ]: Total training Loss: 0.26 [iter: 420 ]: Total training Loss: 0.23 [iter: 430 ]: Total training Loss: 0.25 [iter: 440 ]: Total training Loss: 0.23 [iter: 450 ]: Total training Loss: 0.24
[iter: 460 ]: Total training Loss: 0.22 [iter: 470 ]: Total training Loss: 0.23 [iter: 480 ]: Total training Loss: 0.22 [iter: 490 ]: Total training Loss: 0.23 [iter: 500 ]: Total training Loss: 0.26
[iter: 510 ]: Total training Loss: 0.23 [iter: 520 ]: Total training Loss: 0.22 [iter: 530 ]: Total training Loss: 0.24 [iter: 540 ]: Total training Loss: 0.21 [iter: 550 ]: Total training Loss: 0.24
[iter: 560 ]: Total training Loss: 0.23 [iter: 570 ]: Total training Loss: 0.23 [iter: 580 ]: Total training Loss: 0.22 [iter: 590 ]: Total training Loss: 0.22 [iter: 600 ]: Total training Loss: 0.21
[iter: 610 ]: Total training Loss: 0.20 [iter: 620 ]: Total training Loss: 0.21 [iter: 630 ]: Total training Loss: 0.22 [iter: 640 ]: Total training Loss: 0.23 [iter: 650 ]: Total training Loss: 0.20
[iter: 660 ]: Total training Loss: 0.20 [iter: 670 ]: Total training Loss: 0.24 [iter: 680 ]: Total training Loss: 0.21 [iter: 690 ]: Total training Loss: 0.21 [iter: 700 ]: Total training Loss: 0.22
[iter: 710 ]: Total training Loss: 0.23 [iter: 720 ]: Total training Loss: 0.21 [iter: 730 ]: Total training Loss: 0.20 [iter: 740 ]: Total training Loss: 0.21 [iter: 750 ]: Total training Loss: 0.21
[iter: 760 ]: Total training Loss: 0.21 [iter: 770 ]: Total training Loss: 0.21 [iter: 780 ]: Total training Loss: 0.21 [iter: 790 ]: Total training Loss: 0.21 [iter: 800 ]: Total training Loss: 0.22
[iter: 810 ]: Total training Loss: 0.22 [iter: 820 ]: Total training Loss: 0.19 [iter: 830 ]: Total training Loss: 0.21 [iter: 840 ]: Total training Loss: 0.21 [iter: 850 ]: Total training Loss: 0.20
[iter: 860 ]: Total training Loss: 0.21 [iter: 870 ]: Total training Loss: 0.21 [iter: 880 ]: Total training Loss: 0.20 [iter: 890 ]: Total training Loss: 0.20 [iter: 900 ]: Total training Loss: 0.21
[iter: 910 ]: Total training Loss: 0.21 [iter: 920 ]: Total training Loss: 0.20 [iter: 930 ]: Total training Loss: 0.19 [iter: 940 ]: Total training Loss: 0.20 [iter: 950 ]: Total training Loss: 0.21
[iter: 960 ]: Total training Loss: 0.20 [iter: 970 ]: Total training Loss: 0.21 [iter: 980 ]: Total training Loss: 0.20 [iter: 990 ]: Total training Loss: 0.20
Questions:
Below we will use a VAE to generate new data.

A trained VAE has learned, via the regularizer, to encode samples in such a way so that the distribution of codes z matches the 'prior' distribution p(z)=N(0,I) (Gaussian with 0 mean and 1 std deviation in all dimensions of space Z).
To synthesize new data:
FILL IN THE BLANKS in the below code, to enable it to sample from the N(0,I) normal distribution to synthesize data:
def synthesize(enc_dec_net,
rng,
n_samples):
# enc_dec_net: Network with encoder and decoder, pretrained.
# n_samples: how many samples to produce.
z_dims = enc_dec_net.D_bottleneck # Dimensionality of z codes (and input to decoder).
############################## TODO: Fill in the blanks #############################
# Create samples of z from Gaussian N(0,I), where means are 0 and standard deviations are 1 in all dimensions.
z_samples = np.random.normal(loc=0, scale=1, size=[n_samples, z_dims])
#####################################################################################
z_samples_t = torch.tensor(z_samples, dtype=torch.float)
x_samples = enc_dec_net.decode(z_samples_t)
x_samples_np = x_samples if type(x_samples) is np.ndarray else x_samples.detach().numpy() # torch to numpy
for x_sample in x_samples_np:
plot_image(x_sample.reshape([H_height, W_width]))
# Lets finally run the synthesis and see what happens...
rng = np.random.RandomState(seed=SEED)
synthesize(vae_wide,
rng,
n_samples=20)
If everything was filled correctly, you should see above images created by the decoder for the randomly sampled z-codes.
Questions:
Given an input x, the encoder of a VAE predicts the distribution p(z|x), which explains which values of z are the "probable" codes for x. During training, random z samples are sampled via the reparameterization trick from p(z|x), and the decoder is trained to decode them all to reconstruct x.
If p(z|x) is parameterized as a Gaussian, as commonly done in VAE (and in this tutorial), the predicted mean is the most probable code, and will also be sampled the most. The probability of a code being sampled decreases as we move away from the mean, with a rate dependent on the predicted standard deviation of p(z|x). Therefore, one could wonder how well does the decoder learn to reconstruct z codes from whole p(z|x) (not just the mean), and how do they look. We explore this here.
In the below:
FILL IN THE BLANKS below, to enable the code to sample from predicted distribution p(z|x) for each sample x:
def sample_variations_of_x(enc_dec_net,
imgs_flat,
idx_img_x,
rng,
n_samples):
# enc_dec_net: Network with encoder and decoder, pretrained.
# imgs_flat:
# idx_img_x:
# n_samples: how many samples to produce.
img_x_nparray = imgs_flat[idx_img_x:idx_img_x+1] # Shape: [num samples = 1, H * W]
# Encode:
z_mu, z_logstd = enc_dec_net.encode(img_x_nparray) # expects array shape [N, dims_z]
z_dims = z_mu.shape[1] # Dimensionality of z codes (and input to decoder).
z_mu = z_mu.detach().numpy() # Maky pytorch tensor a numpy
z_logstd = z_logstd.detach().numpy()
############# TODO: Fill in the blanks ##################################
# Samples z values from the predicted probability of z for this sample x: p(z|x) = N(mu(x), std^2(x))
z_std = np.exp(z_logstd) # <------ what you need is returned by the encoding above -------------
z_samples = np.random.normal(loc=z_mu, scale=z_std, size=[n_samples, z_dims]) # <------------------
#########################################################################
x_samples = enc_dec_net.decode(z_samples)
x_samples_np = x_samples if type(x_samples) is np.ndarray else x_samples.detach().numpy() # torch to numpy
print("Real input to encoder:")
plot_image(img_x_nparray.reshape([H_height, W_width]))
print("Reconstructions based on samples from p(z|x=input):")
plot_grid_of_images(x_samples_np.reshape([n_samples, H_height, W_width]),
n_imgs_per_row=10,
dynamically=False)
print("Going to plot all the reconstructed variations one by one, for easier visual investigation:")
for x_sample in x_samples_np:
plot_image(x_sample.reshape([H_height, W_width]))
diff = img_x_nparray[0] - x_samples_np[0]
# Lets finally run the synthesis and see what happens...
rng = np.random.RandomState(seed=SEED)
sample_variations_of_x(vae_wide, # The VAE with 32 dimensional Z.
train_imgs_flat,
idx_img_x=1, # We will encode the image with index 1, and then reconstruct it.
rng=rng,
n_samples=100)
Real input to encoder:
Reconstructions based on samples from p(z|x=input): n_rows= 10
Going to plot all the reconstructed variations one by one, for easier visual investigation:
Questions:
Here, we want to create variations of an input more "systematically" (not random as above). We want to create images that look partly as an input x1 and partly as an input x2, by interpolating between x1 and x2 in the latent space of Z codes.
Steps:
The code below is complete. Run it and observe the output.
def interpolate_between_x1_x2(enc_dec_net,
imgs_flat,
idx_x1,
idx_x2,
rng):
# enc_dec_net: Network with encoder and decoder, pretrained.
# imgs_flat: [number of images, H * W]
# idx_x1: index of x1: x1 = imgs_flat[idx_x1]
# idx_x2: index of x2: x2 = imgs_flat[idx_x2]
# n_samples: how many samples to produce.
img_x1_nparray = imgs_flat[idx_x1]
img_x2_nparray = imgs_flat[idx_x2]
z_mus, z_logstds = enc_dec_net.encode(np.array([img_x1_nparray, img_x2_nparray]))
z_mus = z_mus.detach().numpy()
z_mu1 = z_mus[0] # np vector with [z-dims] elements
z_mu2 = z_mus[1]
z_dims = z_mu1.shape[0] # Dimensionality of z codes (and input to decoder).
# Reconstruct x1 and x2 based on mu codes:
x_samples = enc_dec_net.decode(np.array([z_mu1, z_mu2]))
x_samples = x_samples.detach().numpy()
x1_rec = x_samples[0]
x2_rec = x_samples[1]
# Interpolate:
alphas = [0.0, 0.1, 0.2, 0.3, 0.4, 0.5, 0.6, 0.7, 0.8, 0.9, 1.0]
alphas_np = np.ones([11, z_dims], dtype="float16") # [number of interpolated samples = 11, z-dimensions]
for row_idx in range(alphas_np.shape[0]):
alphas_np[row_idx] = alphas_np[row_idx] * alphas[row_idx] # now whole 1st row == 0.0, 2nd row == 0.1, ...
# Interpolate new z values
zs_to_decode = z_mu1 + alphas_np * (z_mu2 - z_mu1)
x_samples= enc_dec_net.decode(zs_to_decode)
x_samples_np = x_samples if type(x_samples) is np.ndarray else x_samples.detach().numpy() # torch to numpy
print("Inputs to encoder:")
plot_images([img_x1_nparray.reshape([H_height, W_width]), img_x2_nparray.reshape([H_height, W_width])],
titles=["Real x1", "Real x2"])
print("Reconstructions of x1 and x2 based on their most likely predicted z codes (corresponding mus):")
plot_images([x1_rec.reshape([H_height, W_width]), x2_rec.reshape([H_height, W_width])],
titles=["Recon of x1", "Recon of x2"])
print("Decodings based on z samples interpolated between mu(x1) and mu(x2) predicted by encoder:")
plot_grid_of_images(x_samples_np.reshape([11, H_height, W_width]),
n_imgs_per_row=11,
dynamically=False)
print("Going to plot all the reconstructed variations one by one, for easier visual investigation:")
for x_sample in x_samples_np:
plot_image(x_sample.reshape([H_height, W_width]))
# Lets finally run the synthesis and see what happens...
rng = np.random.RandomState(seed=SEED)
interpolate_between_x1_x2(vae_wide,
train_imgs_flat,
idx_x1=1,
idx_x2=3,
rng=rng)
Inputs to encoder:
Reconstructions of x1 and x2 based on their most likely predicted z codes (corresponding mus):
Decodings based on z samples interpolated between mu(x1) and mu(x2) predicted by encoder: n_rows= 1
Going to plot all the reconstructed variations one by one, for easier visual investigation:
Questions:
We saw in the previous Tutorial 2 approaches for using a pre-trained AE to improve performance of a Supervised Classifier, when labelled data are limited. Approach 1: Use weights of AE's encoder as "frozen" feature extractor (with the classifier attached and trained on top), and Approach 2: Use weights of AE's encoder to "initialize" the corresponding layers of a Classifier, and then "refine" the whole classifier with the labelled data.
We saw this clearly improved performance when done with a basic Auto-Encoder.
Here, we will attempt exactly the same with a VAE.
In this task, we will create and train a fully-supervised MLP classifier only on very limited (100) labelled data. This is to compare the performance of this classifier with what we achieve when complementing it with unlabelled data using a VAE (in later Task).

The below code for creating a classifier, cross entropy loss, and training loop is complete.
This is EXACTLY the same as the code of corresponding Task 6 of the previous Tutorial. Just run it.
class Classifier_3layers(Network):
def __init__(self, D_in, D_hid_1, D_hid_2, D_out, rng):
D_in = D_in
D_hid_1 = D_hid_1
D_hid_2 = D_hid_2
D_out = D_out
# === NOTE: Notice that this is exactly the same architecture as encoder of AE in Task 4 ====
w_1_init = rng.normal(loc=0.0, scale=0.01, size=(D_in+1, D_hid_1))
w_2_init = rng.normal(loc=0.0, scale=0.01, size=(D_hid_1+1, D_hid_2))
w_out_init = rng.normal(loc=0.0, scale=0.01, size=(D_hid_2+1, D_out))
w_1 = torch.tensor(w_1_init, dtype=torch.float, requires_grad=True)
w_2 = torch.tensor(w_2_init, dtype=torch.float, requires_grad=True)
w_out = torch.tensor(w_out_init, dtype=torch.float, requires_grad=True)
self.params = [w_1, w_2, w_out]
def forward_pass(self, batch_inp):
# compute predicted y
[w_1, w_2, w_out] = self.params
# In case input is image, make it a tensor.
batch_imgs_t = torch.tensor(batch_inp, dtype=torch.float) if type(batch_inp) is np.ndarray else batch_inp
unary_feature_for_bias = torch.ones(size=(batch_imgs_t.shape[0], 1)) # [N, 1] column vector.
x = torch.cat((batch_imgs_t, unary_feature_for_bias), dim=1) # Extra feature=1 for bias.
# === NOTE: This is the same architecture as encoder of AE in Task 4, with extra classification layer ===
# Layer 1
h1_preact = x.mm(w_1)
h1_act = h1_preact.clamp(min=0)
# Layer 2 (corresponds to bottleneck of the AE):
h1_ext = torch.cat((h1_act, unary_feature_for_bias), dim=1)
h2_preact = h1_ext.mm(w_2)
h2_act = h2_preact.clamp(min=0)
# Output classification layer
h2_ext = torch.cat((h2_act, unary_feature_for_bias), dim=1)
h_out = h2_ext.mm(w_out)
logits = h_out
# === Addition of a softmax function for
# Softmax activation function.
exp_logits = torch.exp(logits)
y_pred = exp_logits / torch.sum(exp_logits, dim=1, keepdim=True)
# sum with Keepdim=True returns [N,1] array. It would be [N] if keepdim=False.
# Torch broadcasts [N,1] to [N,D_out] via repetition, to divide elementwise exp_h2 (which is [N,D_out]).
return y_pred
def cross_entropy(y_pred, y_real, eps=1e-7):
# y_pred: Predicted class-posterior probabilities, returned by forward_pass. Numpy array of shape [N, D_out]
# y_real: One-hot representation of real training labels. Same shape as y_pred.
# If number array is given, change it to a Torch tensor.
y_pred = torch.tensor(y_pred, dtype=torch.float) if type(y_pred) is np.ndarray else y_pred
y_real = torch.tensor(y_real, dtype=torch.float) if type(y_real) is np.ndarray else y_real
x_entr_per_sample = - torch.sum( y_real*torch.log(y_pred+eps), dim=1) # Sum over classes, axis=1
loss = torch.mean(x_entr_per_sample, dim=0) # Expectation of loss: Mean over samples (axis=0).
return loss
from utils.plotting import plot_train_progress_2
def train_classifier(classifier,
pretrained_VAE,
loss_func,
rng,
train_imgs,
train_lbls,
test_imgs,
test_lbls,
batch_size,
learning_rate,
total_iters,
iters_per_test=-1):
# Arguments:
# classifier: A classifier network. It will be trained by this function using labelled data.
# Its input will be either original data (if pretrained_VAE=0), ...
# ... or the output of the feature extractor if one is given.
# pretrained_VAE: A pretrained AutoEncoder that will *not* be trained here.
# It will be used to encode input data.
# The classifier will take as input the output of this feature extractor.
# If pretrained_VAE = None: The classifier will simply receive the actual data as input.
# train_imgs: Vectorized training images
# train_lbls: One hot labels
# test_imgs: Vectorized testing images, to compute generalization accuracy.
# test_lbls: One hot labels for test data.
# batch_size: batch size
# learning_rate: come on...
# total_iters: how many SGD iterations to perform.
# iters_per_test: We will 'test' the model on test data every few iterations as specified by this.
values_to_plot = {'loss':[], 'acc_train': [], 'acc_test': []}
optimizer = optim.Adam(classifier.params, lr=learning_rate)
for t in range(total_iters):
# Sample batch for this SGD iteration
train_imgs_batch, train_lbls_batch = get_random_batch(train_imgs, train_lbls, batch_size, rng)
# Forward pass
if pretrained_VAE is None:
inp_to_classifier = train_imgs_batch
else:
############### TODO FOR TASK-11 #########################################
# FILL IN THE BLANK, to provide as input to the classifier the predicted MEAN of p(z|x) for each x.
# Why? Because the mean is the most likely (probable) code z for x!!
#
z_codes_mu, z_codes_logstd = pretrained_VAE.encode(train_imgs_batch) # AE encodes. Output will be given to Classifier
inp_to_classifier = z_codes_mu # <----------------------------------------
############################################################################
y_pred = classifier.forward_pass(inp_to_classifier)
# Compute loss:
y_real = train_lbls_batch
loss = loss_func(y_pred, y_real) # Cross entropy
# Backprop and updates.
optimizer.zero_grad()
grads = classifier.backward_pass(loss)
optimizer.step()
# ==== Report training loss and accuracy ======
# y_pred and loss can be either np.array, or torch.tensor (see later). If tensor, make it np.array.
y_pred_numpy = y_pred if type(y_pred) is np.ndarray else y_pred.detach().numpy()
y_pred_lbls = np.argmax(y_pred_numpy, axis=1) # y_pred is soft/probability. Make it a hard one-hot label.
y_real_lbls = np.argmax(y_real, axis=1)
acc_train = np.mean(y_pred_lbls == y_real_lbls) * 100. # percentage
loss_numpy = loss if type(loss) is type(float) else loss.item()
if t%10 == 0:
print("[iter:", t, "]: Training Loss: {0:.2f}".format(loss), "\t Accuracy: {0:.2f}".format(acc_train))
# =============== Every few iterations, test accuracy ================#
if t==total_iters-1 or t%iters_per_test == 0:
if pretrained_VAE is None:
inp_to_classifier_test = test_imgs
else:
z_codes_test_mu, z_codes_test_logstd = pretrained_VAE.encode(test_imgs)
inp_to_classifier_test = z_codes_test_mu
y_pred_test = classifier.forward_pass(inp_to_classifier_test)
# ==== Report test accuracy ======
y_pred_test_numpy = y_pred_test if type(y_pred_test) is np.ndarray else y_pred_test.detach().numpy()
y_pred_lbls_test = np.argmax(y_pred_test_numpy, axis=1)
y_real_lbls_test = np.argmax(test_lbls, axis=1)
acc_test = np.mean(y_pred_lbls_test == y_real_lbls_test) * 100.
print("\t\t\t\t\t\t\t\t Testing Accuracy: {0:.2f}".format(acc_test))
# Keep list of metrics to plot progress.
values_to_plot['loss'].append(loss_numpy)
values_to_plot['acc_train'].append(acc_train)
values_to_plot['acc_test'].append(acc_test)
# In the end of the process, plot loss accuracy on training and testing data.
plot_train_progress_2(values_to_plot['loss'], values_to_plot['acc_train'], values_to_plot['acc_test'], iters_per_test)
Now below, we create an instance of this 3-layered classifier and train it on 100 labeled samples. We evaluate generalization on Test samples.
# Train Classifier from scratch (initialized randomly)
# Create the network
rng = np.random.RandomState(seed=SEED)
net_classifier_from_scratch = Classifier_3layers(D_in=H_height*W_width,
D_hid_1=256,
D_hid_2=32,
D_out=C_classes,
rng=rng)
# Start training
train_classifier(net_classifier_from_scratch,
None, # No pretrained AE
cross_entropy,
rng,
train_imgs_flat[:100],
train_lbls_onehot[:100],
test_imgs_flat,
test_lbls_onehot,
batch_size=40,
learning_rate=3e-3,
total_iters=1000,
iters_per_test=20)
[iter: 0 ]: Training Loss: 2.30 Accuracy: 7.50 Testing Accuracy: 11.88 [iter: 10 ]: Training Loss: 2.17 Accuracy: 30.00 [iter: 20 ]: Training Loss: 1.71 Accuracy: 40.00 Testing Accuracy: 30.05 [iter: 30 ]: Training Loss: 1.33 Accuracy: 52.50 [iter: 40 ]: Training Loss: 1.31 Accuracy: 47.50 Testing Accuracy: 36.72 [iter: 50 ]: Training Loss: 0.80 Accuracy: 70.00 [iter: 60 ]: Training Loss: 0.75 Accuracy: 75.00 Testing Accuracy: 45.20 [iter: 70 ]: Training Loss: 0.49 Accuracy: 85.00 [iter: 80 ]: Training Loss: 0.33 Accuracy: 92.50 Testing Accuracy: 50.10 [iter: 90 ]: Training Loss: 0.34 Accuracy: 85.00 [iter: 100 ]: Training Loss: 0.25 Accuracy: 100.00 Testing Accuracy: 51.53 [iter: 110 ]: Training Loss: 0.10 Accuracy: 100.00 [iter: 120 ]: Training Loss: 0.07 Accuracy: 100.00 Testing Accuracy: 53.92 [iter: 130 ]: Training Loss: 0.05 Accuracy: 100.00 [iter: 140 ]: Training Loss: 0.05 Accuracy: 100.00 Testing Accuracy: 53.62 [iter: 150 ]: Training Loss: 0.03 Accuracy: 100.00 [iter: 160 ]: Training Loss: 0.02 Accuracy: 100.00 Testing Accuracy: 53.95 [iter: 170 ]: Training Loss: 0.01 Accuracy: 100.00 [iter: 180 ]: Training Loss: 0.01 Accuracy: 100.00 Testing Accuracy: 54.78 [iter: 190 ]: Training Loss: 0.01 Accuracy: 100.00 [iter: 200 ]: Training Loss: 0.01 Accuracy: 100.00 Testing Accuracy: 54.37 [iter: 210 ]: Training Loss: 0.01 Accuracy: 100.00 [iter: 220 ]: Training Loss: 0.01 Accuracy: 100.00 Testing Accuracy: 54.62 [iter: 230 ]: Training Loss: 0.00 Accuracy: 100.00 [iter: 240 ]: Training Loss: 0.00 Accuracy: 100.00 Testing Accuracy: 54.74 [iter: 250 ]: Training Loss: 0.00 Accuracy: 100.00 [iter: 260 ]: Training Loss: 0.00 Accuracy: 100.00 Testing Accuracy: 54.77 [iter: 270 ]: Training Loss: 0.00 Accuracy: 100.00 [iter: 280 ]: Training Loss: 0.00 Accuracy: 100.00 Testing Accuracy: 54.56 [iter: 290 ]: Training Loss: 0.00 Accuracy: 100.00 [iter: 300 ]: Training Loss: 0.00 Accuracy: 100.00 Testing Accuracy: 54.69 [iter: 310 ]: Training Loss: 0.00 Accuracy: 100.00 [iter: 320 ]: Training Loss: 0.00 Accuracy: 100.00 Testing Accuracy: 55.02 [iter: 330 ]: Training Loss: 0.00 Accuracy: 100.00 [iter: 340 ]: Training Loss: 0.00 Accuracy: 100.00 Testing Accuracy: 54.86 [iter: 350 ]: Training Loss: 0.00 Accuracy: 100.00 [iter: 360 ]: Training Loss: 0.00 Accuracy: 100.00 Testing Accuracy: 54.83 [iter: 370 ]: Training Loss: 0.00 Accuracy: 100.00 [iter: 380 ]: Training Loss: 0.00 Accuracy: 100.00 Testing Accuracy: 55.19 [iter: 390 ]: Training Loss: 0.00 Accuracy: 100.00 [iter: 400 ]: Training Loss: 0.00 Accuracy: 100.00 Testing Accuracy: 54.98 [iter: 410 ]: Training Loss: 0.00 Accuracy: 100.00 [iter: 420 ]: Training Loss: 0.00 Accuracy: 100.00 Testing Accuracy: 55.19 [iter: 430 ]: Training Loss: 0.00 Accuracy: 100.00 [iter: 440 ]: Training Loss: 0.00 Accuracy: 100.00 Testing Accuracy: 55.16 [iter: 450 ]: Training Loss: 0.00 Accuracy: 100.00 [iter: 460 ]: Training Loss: 0.00 Accuracy: 100.00 Testing Accuracy: 55.11 [iter: 470 ]: Training Loss: 0.00 Accuracy: 100.00 [iter: 480 ]: Training Loss: 0.00 Accuracy: 100.00 Testing Accuracy: 55.28 [iter: 490 ]: Training Loss: 0.00 Accuracy: 100.00 [iter: 500 ]: Training Loss: 0.00 Accuracy: 100.00 Testing Accuracy: 55.38 [iter: 510 ]: Training Loss: 0.00 Accuracy: 100.00 [iter: 520 ]: Training Loss: 0.00 Accuracy: 100.00 Testing Accuracy: 55.30 [iter: 530 ]: Training Loss: 0.00 Accuracy: 100.00 [iter: 540 ]: Training Loss: 0.00 Accuracy: 100.00 Testing Accuracy: 55.25 [iter: 550 ]: Training Loss: 0.00 Accuracy: 100.00 [iter: 560 ]: Training Loss: 0.00 Accuracy: 100.00 Testing Accuracy: 55.32 [iter: 570 ]: Training Loss: 0.00 Accuracy: 100.00 [iter: 580 ]: Training Loss: 0.00 Accuracy: 100.00 Testing Accuracy: 55.36 [iter: 590 ]: Training Loss: 0.00 Accuracy: 100.00 [iter: 600 ]: Training Loss: 0.00 Accuracy: 100.00 Testing Accuracy: 55.38 [iter: 610 ]: Training Loss: 0.00 Accuracy: 100.00 [iter: 620 ]: Training Loss: 0.00 Accuracy: 100.00 Testing Accuracy: 55.57 [iter: 630 ]: Training Loss: 0.00 Accuracy: 100.00 [iter: 640 ]: Training Loss: 0.00 Accuracy: 100.00 Testing Accuracy: 55.25 [iter: 650 ]: Training Loss: 0.00 Accuracy: 100.00 [iter: 660 ]: Training Loss: 0.00 Accuracy: 100.00 Testing Accuracy: 55.45 [iter: 670 ]: Training Loss: 0.00 Accuracy: 100.00 [iter: 680 ]: Training Loss: 0.00 Accuracy: 100.00 Testing Accuracy: 55.49 [iter: 690 ]: Training Loss: 0.00 Accuracy: 100.00 [iter: 700 ]: Training Loss: 0.00 Accuracy: 100.00 Testing Accuracy: 55.39 [iter: 710 ]: Training Loss: 0.00 Accuracy: 100.00 [iter: 720 ]: Training Loss: 0.00 Accuracy: 100.00 Testing Accuracy: 55.53 [iter: 730 ]: Training Loss: 0.00 Accuracy: 100.00 [iter: 740 ]: Training Loss: 0.00 Accuracy: 100.00 Testing Accuracy: 55.59 [iter: 750 ]: Training Loss: 0.00 Accuracy: 100.00 [iter: 760 ]: Training Loss: 0.00 Accuracy: 100.00 Testing Accuracy: 55.55 [iter: 770 ]: Training Loss: 0.00 Accuracy: 100.00 [iter: 780 ]: Training Loss: 0.00 Accuracy: 100.00 Testing Accuracy: 55.53 [iter: 790 ]: Training Loss: 0.00 Accuracy: 100.00 [iter: 800 ]: Training Loss: 0.00 Accuracy: 100.00 Testing Accuracy: 55.56 [iter: 810 ]: Training Loss: 0.00 Accuracy: 100.00 [iter: 820 ]: Training Loss: 0.00 Accuracy: 100.00 Testing Accuracy: 55.67 [iter: 830 ]: Training Loss: 0.00 Accuracy: 100.00 [iter: 840 ]: Training Loss: 0.00 Accuracy: 100.00 Testing Accuracy: 55.66 [iter: 850 ]: Training Loss: 0.00 Accuracy: 100.00 [iter: 860 ]: Training Loss: 0.00 Accuracy: 100.00 Testing Accuracy: 55.67 [iter: 870 ]: Training Loss: 0.00 Accuracy: 100.00 [iter: 880 ]: Training Loss: 0.00 Accuracy: 100.00 Testing Accuracy: 55.70 [iter: 890 ]: Training Loss: 0.00 Accuracy: 100.00 [iter: 900 ]: Training Loss: 0.00 Accuracy: 100.00 Testing Accuracy: 55.71 [iter: 910 ]: Training Loss: 0.00 Accuracy: 100.00 [iter: 920 ]: Training Loss: 0.00 Accuracy: 100.00 Testing Accuracy: 55.74 [iter: 930 ]: Training Loss: 0.00 Accuracy: 100.00 [iter: 940 ]: Training Loss: 0.00 Accuracy: 100.00 Testing Accuracy: 55.68 [iter: 950 ]: Training Loss: 0.00 Accuracy: 100.00 [iter: 960 ]: Training Loss: 0.00 Accuracy: 100.00 Testing Accuracy: 55.44 [iter: 970 ]: Training Loss: 0.00 Accuracy: 100.00 [iter: 980 ]: Training Loss: 0.00 Accuracy: 100.00 Testing Accuracy: 55.59 [iter: 990 ]: Training Loss: 0.00 Accuracy: 100.00 Testing Accuracy: 55.60
This is "exactly" the same as the corresponding Task 6 in the previous Tutorial. Simply run it, and note down the final Accuracy on the Test data.
Approach-1: We take the encoder of the pre-trained VAE and place an untrained, small (1 layer) Classifier on top. The Classifier receives as input, codes that the VAE's encoder predicts when given input x. We then use the limited labelled data for training. Importantly, we only train the small Classifier. The encoder is used as 'frozen' (does not get trained further) feature extractor. See next figure for a visual explanation.

TODO: This is the same as Task 7 of previous Tutorial on AEs, with one important peculiarity: When the encoder predicts a whole distribution of codes for each x, p(z|x), what code should we use as oputput of the "feature extractor" (encoder of VAE) and as input to the classifier? Note: We want the classifier to be "deterministic", not stochastic, so we wont be sampling. Probably we want the most probable code z for each x...
Go back to Task 10 and the function train_classifier(...) defined therein. Fill in the gap, choosing which code to use as input to the Classifier. AFTER you have done that, run the code below...
# Train classifier on top of pre-trained AE encoder
class Classifier_1layer(Network):
# Classifier with just 1 layer, the classification layer
def __init__(self, D_in, D_out, rng):
# D_in: dimensions of input
# D_out: dimension of output (number of classes)
w_out_init = rng.normal(loc=0.0, scale=0.01, size=(D_in+1, D_out))
w_out = torch.tensor(w_out_init, dtype=torch.float, requires_grad=True)
self.params = [w_out]
def forward_pass(self, batch_inp):
# compute predicted y
[w_out] = self.params
# In case input is image, make it a tensor.
batch_inp_t = torch.tensor(batch_inp, dtype=torch.float) if type(batch_inp) is np.ndarray else batch_inp
unary_feature_for_bias = torch.ones(size=(batch_inp_t.shape[0], 1)) # [N, 1] column vector.
batch_inp_ext = torch.cat((batch_inp_t, unary_feature_for_bias), dim=1) # Extra feature=1 for bias.
# Output classification layer
logits = batch_inp_ext.mm(w_out)
# Output layer activation function
# Softmax activation function.
exp_logits = torch.exp(logits)
y_pred = exp_logits / torch.sum(exp_logits, dim=1, keepdim=True)
# sum with Keepdim=True returns [N,1] array. It would be [N] if keepdim=False.
# Torch broadcasts [N,1] to [N,D_out] via repetition, to divide elementwise exp_h2 (which is [N,D_out]).
return y_pred
# Create the network
rng = np.random.RandomState(seed=SEED) # Random number generator
# As input, it will be getting z-codes from the AE with 32-neurons bottleneck from Task 4.
classifier_1layer = Classifier_1layer(vae_wide.D_bottleneck, # Input dimension is dimensions of AE's Z
C_classes,
rng=rng)
train_classifier(classifier_1layer,
vae_wide, # Pretrained AE, to use as feature extractor.
cross_entropy,
rng,
train_imgs_flat[:100],
train_lbls_onehot[:100],
test_imgs_flat,
test_lbls_onehot,
batch_size=40,
learning_rate=3e-3,
total_iters=1000,
iters_per_test=20)
[iter: 0 ]: Training Loss: 2.31 Accuracy: 5.00 Testing Accuracy: 7.51 [iter: 10 ]: Training Loss: 2.20 Accuracy: 62.50 [iter: 20 ]: Training Loss: 2.12 Accuracy: 65.00 Testing Accuracy: 47.78 [iter: 30 ]: Training Loss: 2.06 Accuracy: 70.00 [iter: 40 ]: Training Loss: 1.97 Accuracy: 72.50 Testing Accuracy: 52.45 [iter: 50 ]: Training Loss: 1.92 Accuracy: 67.50 [iter: 60 ]: Training Loss: 1.74 Accuracy: 62.50 Testing Accuracy: 52.38 [iter: 70 ]: Training Loss: 1.80 Accuracy: 67.50 [iter: 80 ]: Training Loss: 1.60 Accuracy: 77.50 Testing Accuracy: 52.65 [iter: 90 ]: Training Loss: 1.73 Accuracy: 67.50 [iter: 100 ]: Training Loss: 1.66 Accuracy: 67.50 Testing Accuracy: 53.64 [iter: 110 ]: Training Loss: 1.48 Accuracy: 75.00 [iter: 120 ]: Training Loss: 1.42 Accuracy: 77.50 Testing Accuracy: 54.67 [iter: 130 ]: Training Loss: 1.41 Accuracy: 77.50 [iter: 140 ]: Training Loss: 1.37 Accuracy: 67.50 Testing Accuracy: 54.43 [iter: 150 ]: Training Loss: 1.40 Accuracy: 80.00 [iter: 160 ]: Training Loss: 1.22 Accuracy: 72.50 Testing Accuracy: 55.34 [iter: 170 ]: Training Loss: 1.29 Accuracy: 80.00 [iter: 180 ]: Training Loss: 1.48 Accuracy: 57.50 Testing Accuracy: 56.53 [iter: 190 ]: Training Loss: 1.29 Accuracy: 70.00 [iter: 200 ]: Training Loss: 1.30 Accuracy: 80.00 Testing Accuracy: 56.83 [iter: 210 ]: Training Loss: 1.17 Accuracy: 85.00 [iter: 220 ]: Training Loss: 1.18 Accuracy: 72.50 Testing Accuracy: 56.96 [iter: 230 ]: Training Loss: 1.27 Accuracy: 70.00 [iter: 240 ]: Training Loss: 0.95 Accuracy: 82.50 Testing Accuracy: 57.49 [iter: 250 ]: Training Loss: 1.11 Accuracy: 77.50 [iter: 260 ]: Training Loss: 1.08 Accuracy: 70.00 Testing Accuracy: 57.72 [iter: 270 ]: Training Loss: 0.99 Accuracy: 80.00 [iter: 280 ]: Training Loss: 0.95 Accuracy: 90.00 Testing Accuracy: 58.36 [iter: 290 ]: Training Loss: 1.00 Accuracy: 70.00 [iter: 300 ]: Training Loss: 0.89 Accuracy: 85.00 Testing Accuracy: 58.29 [iter: 310 ]: Training Loss: 1.09 Accuracy: 77.50 [iter: 320 ]: Training Loss: 0.97 Accuracy: 77.50 Testing Accuracy: 58.44 [iter: 330 ]: Training Loss: 0.92 Accuracy: 85.00 [iter: 340 ]: Training Loss: 0.96 Accuracy: 72.50 Testing Accuracy: 59.07 [iter: 350 ]: Training Loss: 0.98 Accuracy: 80.00 [iter: 360 ]: Training Loss: 1.20 Accuracy: 67.50 Testing Accuracy: 59.11 [iter: 370 ]: Training Loss: 1.10 Accuracy: 65.00 [iter: 380 ]: Training Loss: 1.08 Accuracy: 75.00 Testing Accuracy: 59.47 [iter: 390 ]: Training Loss: 0.97 Accuracy: 72.50 [iter: 400 ]: Training Loss: 0.94 Accuracy: 77.50 Testing Accuracy: 59.47 [iter: 410 ]: Training Loss: 0.76 Accuracy: 90.00 [iter: 420 ]: Training Loss: 1.07 Accuracy: 65.00 Testing Accuracy: 59.38 [iter: 430 ]: Training Loss: 0.98 Accuracy: 70.00 [iter: 440 ]: Training Loss: 0.80 Accuracy: 77.50 Testing Accuracy: 59.41 [iter: 450 ]: Training Loss: 1.02 Accuracy: 67.50 [iter: 460 ]: Training Loss: 0.89 Accuracy: 77.50 Testing Accuracy: 59.25 [iter: 470 ]: Training Loss: 0.87 Accuracy: 77.50 [iter: 480 ]: Training Loss: 0.82 Accuracy: 77.50 Testing Accuracy: 59.79 [iter: 490 ]: Training Loss: 0.74 Accuracy: 85.00 [iter: 500 ]: Training Loss: 0.99 Accuracy: 75.00 Testing Accuracy: 59.98 [iter: 510 ]: Training Loss: 0.92 Accuracy: 77.50 [iter: 520 ]: Training Loss: 0.69 Accuracy: 90.00 Testing Accuracy: 59.92 [iter: 530 ]: Training Loss: 0.65 Accuracy: 90.00 [iter: 540 ]: Training Loss: 0.96 Accuracy: 67.50 Testing Accuracy: 60.17 [iter: 550 ]: Training Loss: 0.84 Accuracy: 75.00 [iter: 560 ]: Training Loss: 0.81 Accuracy: 80.00 Testing Accuracy: 60.24 [iter: 570 ]: Training Loss: 0.81 Accuracy: 75.00 [iter: 580 ]: Training Loss: 0.97 Accuracy: 67.50 Testing Accuracy: 60.42 [iter: 590 ]: Training Loss: 0.94 Accuracy: 80.00 [iter: 600 ]: Training Loss: 0.83 Accuracy: 70.00 Testing Accuracy: 60.30 [iter: 610 ]: Training Loss: 0.78 Accuracy: 82.50 [iter: 620 ]: Training Loss: 0.80 Accuracy: 75.00 Testing Accuracy: 60.27 [iter: 630 ]: Training Loss: 0.87 Accuracy: 72.50 [iter: 640 ]: Training Loss: 0.64 Accuracy: 77.50 Testing Accuracy: 60.35 [iter: 650 ]: Training Loss: 0.87 Accuracy: 72.50 [iter: 660 ]: Training Loss: 0.77 Accuracy: 80.00 Testing Accuracy: 60.53 [iter: 670 ]: Training Loss: 0.74 Accuracy: 77.50 [iter: 680 ]: Training Loss: 0.76 Accuracy: 77.50 Testing Accuracy: 60.48 [iter: 690 ]: Training Loss: 0.83 Accuracy: 72.50 [iter: 700 ]: Training Loss: 0.73 Accuracy: 82.50 Testing Accuracy: 60.46 [iter: 710 ]: Training Loss: 0.82 Accuracy: 72.50 [iter: 720 ]: Training Loss: 0.75 Accuracy: 75.00 Testing Accuracy: 60.50 [iter: 730 ]: Training Loss: 0.69 Accuracy: 77.50 [iter: 740 ]: Training Loss: 0.90 Accuracy: 72.50 Testing Accuracy: 60.43 [iter: 750 ]: Training Loss: 0.74 Accuracy: 77.50 [iter: 760 ]: Training Loss: 0.65 Accuracy: 77.50 Testing Accuracy: 60.28 [iter: 770 ]: Training Loss: 0.76 Accuracy: 72.50 [iter: 780 ]: Training Loss: 0.57 Accuracy: 82.50 Testing Accuracy: 60.40 [iter: 790 ]: Training Loss: 0.71 Accuracy: 77.50 [iter: 800 ]: Training Loss: 0.54 Accuracy: 92.50 Testing Accuracy: 60.55 [iter: 810 ]: Training Loss: 0.78 Accuracy: 77.50 [iter: 820 ]: Training Loss: 0.68 Accuracy: 80.00 Testing Accuracy: 60.44 [iter: 830 ]: Training Loss: 0.59 Accuracy: 80.00 [iter: 840 ]: Training Loss: 0.71 Accuracy: 82.50 Testing Accuracy: 60.41 [iter: 850 ]: Training Loss: 0.77 Accuracy: 75.00 [iter: 860 ]: Training Loss: 0.56 Accuracy: 85.00 Testing Accuracy: 60.34 [iter: 870 ]: Training Loss: 0.55 Accuracy: 90.00 [iter: 880 ]: Training Loss: 0.67 Accuracy: 82.50 Testing Accuracy: 60.35 [iter: 890 ]: Training Loss: 0.54 Accuracy: 77.50 [iter: 900 ]: Training Loss: 0.78 Accuracy: 70.00 Testing Accuracy: 60.18 [iter: 910 ]: Training Loss: 0.54 Accuracy: 85.00 [iter: 920 ]: Training Loss: 0.62 Accuracy: 82.50 Testing Accuracy: 60.26 [iter: 930 ]: Training Loss: 0.73 Accuracy: 77.50 [iter: 940 ]: Training Loss: 0.67 Accuracy: 77.50 Testing Accuracy: 60.20 [iter: 950 ]: Training Loss: 0.61 Accuracy: 85.00 [iter: 960 ]: Training Loss: 0.89 Accuracy: 60.00 Testing Accuracy: 60.26 [iter: 970 ]: Training Loss: 0.85 Accuracy: 65.00 [iter: 980 ]: Training Loss: 0.56 Accuracy: 85.00 Testing Accuracy: 60.24 [iter: 990 ]: Training Loss: 0.64 Accuracy: 77.50 Testing Accuracy: 59.89
If you completed the task appropriately, you should see the model getting trained and performance reported at the bottom. The expected TRAINING accuracy is approximately 80%, and the TESTING accuracy is around 53% in the end of training.
Questions:
Approach-2: The second approach is to build a Classifier that has the same architecture as the encoder of the VAE, followed by an extra classification layer. We first train the VAE (already done in Task 6). Then, we use the pre-trained weights of the VAE's encoder to initialize the corresponding parameters of the Classifier. The classification layer of the Classifier is initialized randomly. Then, with the limited labelled data, we refine (train) all the parameters of the classifier.

This is the same as Task 8 of the previous Tutorial on AEs, with one important peculiarity (related to Task 11 here): Since the Classifier needs to be deterministic, the we do not use the weights that predict the standard-deviation in the VAE's encoder. We only use the neurons that predict the mean of p(z|x) (the most likely code) to initilize the corresponding layers of the Supervised Classifier.
The code below is complete.
Read it, understand it, run it, and try to answer the questions below.
# Pre-train a classifier.
# The below classifier has THE SAME architecture as the 3-layer Classifier that we trained...
# ... in a purely supervised manner in Task-10.
# This is done by inheriting the class (Classifier_3layers), therefore uses THE SAME forward_pass() function.
# THE ONLY DIFFERENCE is in the construction __init__.
# This 'pretrained' classifier receives as input a pretrained autoencoder (pretrained_VAE) from Task 6.
# It then uses the parameters of the AE's encoder to initialize its own parameters, rather than random initialization.
# The model is then trained all together.
class Classifier_3layers_pretrained(Classifier_3layers):
def __init__(self, pretrained_VAE, D_in, D_out, rng):
D_in = D_in
D_hid_1 = 256
D_hid_2 = 32
D_out = D_out
w_out_init = rng.normal(loc=0.0, scale=0.01, size=(D_hid_2+1, D_out))
[vae_w1, vae_w2_mu, vae_w2_std, vae_w3, vae_w4] = pretrained_VAE.params # Pre-trained parameters of pre-trained VAE.
w_1 = torch.tensor(vae_w1, dtype=torch.float, requires_grad=True)
w_2 = torch.tensor(vae_w2_mu, dtype=torch.float, requires_grad=True)
w_out = torch.tensor(w_out_init, dtype=torch.float, requires_grad=True)
self.params = [w_1, w_2, w_out]
# Create the network
rng = np.random.RandomState(seed=SEED) # Random number generator
classifier_3layers_pretrained = Classifier_3layers_pretrained(vae_wide, # The AE pre-trained in Task 4.
train_imgs_flat.shape[1],
C_classes,
rng=rng)
# Start training
# NOTE: Only the 3-layer pretrained classifier is used, and will be trained all together.
# No frozen feature extractor.
train_classifier(classifier_3layers_pretrained, # classifier that will be trained.
None, # No pretrained AE to act as 'frozen' feature extractor.
cross_entropy,
rng,
train_imgs_flat[:100],
train_lbls_onehot[:100],
test_imgs_flat,
test_lbls_onehot,
batch_size=40,
learning_rate=3e-3,
total_iters=1000,
iters_per_test=20)
/tmp/ipykernel_37896/2061375935.py:21: UserWarning: To copy construct from a tensor, it is recommended to use sourceTensor.clone().detach() or sourceTensor.clone().detach().requires_grad_(True), rather than torch.tensor(sourceTensor). w_1 = torch.tensor(vae_w1, dtype=torch.float, requires_grad=True) /tmp/ipykernel_37896/2061375935.py:22: UserWarning: To copy construct from a tensor, it is recommended to use sourceTensor.clone().detach() or sourceTensor.clone().detach().requires_grad_(True), rather than torch.tensor(sourceTensor). w_2 = torch.tensor(vae_w2_mu, dtype=torch.float, requires_grad=True)
[iter: 0 ]: Training Loss: 2.31 Accuracy: 0.00 Testing Accuracy: 7.65 [iter: 10 ]: Training Loss: 2.14 Accuracy: 47.50 [iter: 20 ]: Training Loss: 1.79 Accuracy: 40.00 Testing Accuracy: 29.34 [iter: 30 ]: Training Loss: 1.58 Accuracy: 45.00 [iter: 40 ]: Training Loss: 1.12 Accuracy: 65.00 Testing Accuracy: 47.50 [iter: 50 ]: Training Loss: 0.81 Accuracy: 75.00 [iter: 60 ]: Training Loss: 0.53 Accuracy: 80.00 Testing Accuracy: 55.27 [iter: 70 ]: Training Loss: 0.46 Accuracy: 80.00 [iter: 80 ]: Training Loss: 0.22 Accuracy: 97.50 Testing Accuracy: 59.82 [iter: 90 ]: Training Loss: 0.21 Accuracy: 97.50 [iter: 100 ]: Training Loss: 0.18 Accuracy: 97.50 Testing Accuracy: 62.07 [iter: 110 ]: Training Loss: 0.09 Accuracy: 97.50 [iter: 120 ]: Training Loss: 0.06 Accuracy: 100.00 Testing Accuracy: 60.89 [iter: 130 ]: Training Loss: 0.04 Accuracy: 100.00 [iter: 140 ]: Training Loss: 0.04 Accuracy: 100.00 Testing Accuracy: 61.61 [iter: 150 ]: Training Loss: 0.04 Accuracy: 100.00 [iter: 160 ]: Training Loss: 0.02 Accuracy: 100.00 Testing Accuracy: 61.18 [iter: 170 ]: Training Loss: 0.02 Accuracy: 100.00 [iter: 180 ]: Training Loss: 0.02 Accuracy: 100.00 Testing Accuracy: 62.47 [iter: 190 ]: Training Loss: 0.01 Accuracy: 100.00 [iter: 200 ]: Training Loss: 0.01 Accuracy: 100.00 Testing Accuracy: 61.85 [iter: 210 ]: Training Loss: 0.01 Accuracy: 100.00 [iter: 220 ]: Training Loss: 0.01 Accuracy: 100.00 Testing Accuracy: 61.93 [iter: 230 ]: Training Loss: 0.01 Accuracy: 100.00 [iter: 240 ]: Training Loss: 0.01 Accuracy: 100.00 Testing Accuracy: 61.59 [iter: 250 ]: Training Loss: 0.01 Accuracy: 100.00 [iter: 260 ]: Training Loss: 0.01 Accuracy: 100.00 Testing Accuracy: 62.02 [iter: 270 ]: Training Loss: 0.01 Accuracy: 100.00 [iter: 280 ]: Training Loss: 0.00 Accuracy: 100.00 Testing Accuracy: 61.56 [iter: 290 ]: Training Loss: 0.00 Accuracy: 100.00 [iter: 300 ]: Training Loss: 0.00 Accuracy: 100.00 Testing Accuracy: 61.88 [iter: 310 ]: Training Loss: 0.00 Accuracy: 100.00 [iter: 320 ]: Training Loss: 0.00 Accuracy: 100.00 Testing Accuracy: 61.76 [iter: 330 ]: Training Loss: 0.00 Accuracy: 100.00 [iter: 340 ]: Training Loss: 0.00 Accuracy: 100.00 Testing Accuracy: 61.77 [iter: 350 ]: Training Loss: 0.00 Accuracy: 100.00 [iter: 360 ]: Training Loss: 0.00 Accuracy: 100.00 Testing Accuracy: 61.75 [iter: 370 ]: Training Loss: 0.00 Accuracy: 100.00 [iter: 380 ]: Training Loss: 0.00 Accuracy: 100.00 Testing Accuracy: 61.61 [iter: 390 ]: Training Loss: 0.00 Accuracy: 100.00 [iter: 400 ]: Training Loss: 0.00 Accuracy: 100.00 Testing Accuracy: 61.85 [iter: 410 ]: Training Loss: 0.00 Accuracy: 100.00 [iter: 420 ]: Training Loss: 0.00 Accuracy: 100.00 Testing Accuracy: 61.33 [iter: 430 ]: Training Loss: 0.00 Accuracy: 100.00 [iter: 440 ]: Training Loss: 0.00 Accuracy: 100.00 Testing Accuracy: 61.77 [iter: 450 ]: Training Loss: 0.00 Accuracy: 100.00 [iter: 460 ]: Training Loss: 0.00 Accuracy: 100.00 Testing Accuracy: 61.75 [iter: 470 ]: Training Loss: 0.00 Accuracy: 100.00 [iter: 480 ]: Training Loss: 0.00 Accuracy: 100.00 Testing Accuracy: 61.80 [iter: 490 ]: Training Loss: 0.00 Accuracy: 100.00 [iter: 500 ]: Training Loss: 0.00 Accuracy: 100.00 Testing Accuracy: 61.81 [iter: 510 ]: Training Loss: 0.00 Accuracy: 100.00 [iter: 520 ]: Training Loss: 0.00 Accuracy: 100.00 Testing Accuracy: 61.68 [iter: 530 ]: Training Loss: 0.00 Accuracy: 100.00 [iter: 540 ]: Training Loss: 0.00 Accuracy: 100.00 Testing Accuracy: 61.63 [iter: 550 ]: Training Loss: 0.00 Accuracy: 100.00 [iter: 560 ]: Training Loss: 0.00 Accuracy: 100.00 Testing Accuracy: 61.65 [iter: 570 ]: Training Loss: 0.00 Accuracy: 100.00 [iter: 580 ]: Training Loss: 0.00 Accuracy: 100.00 Testing Accuracy: 61.80 [iter: 590 ]: Training Loss: 0.00 Accuracy: 100.00 [iter: 600 ]: Training Loss: 0.00 Accuracy: 100.00 Testing Accuracy: 61.55 [iter: 610 ]: Training Loss: 0.00 Accuracy: 100.00 [iter: 620 ]: Training Loss: 0.00 Accuracy: 100.00 Testing Accuracy: 61.65 [iter: 630 ]: Training Loss: 0.00 Accuracy: 100.00 [iter: 640 ]: Training Loss: 0.00 Accuracy: 100.00 Testing Accuracy: 61.67 [iter: 650 ]: Training Loss: 0.00 Accuracy: 100.00 [iter: 660 ]: Training Loss: 0.00 Accuracy: 100.00 Testing Accuracy: 61.64 [iter: 670 ]: Training Loss: 0.00 Accuracy: 100.00 [iter: 680 ]: Training Loss: 0.00 Accuracy: 100.00 Testing Accuracy: 61.34 [iter: 690 ]: Training Loss: 0.00 Accuracy: 100.00 [iter: 700 ]: Training Loss: 0.00 Accuracy: 100.00 Testing Accuracy: 61.73 [iter: 710 ]: Training Loss: 0.00 Accuracy: 100.00 [iter: 720 ]: Training Loss: 0.00 Accuracy: 100.00 Testing Accuracy: 61.72 [iter: 730 ]: Training Loss: 0.00 Accuracy: 100.00 [iter: 740 ]: Training Loss: 0.00 Accuracy: 100.00 Testing Accuracy: 61.78 [iter: 750 ]: Training Loss: 0.00 Accuracy: 100.00 [iter: 760 ]: Training Loss: 0.00 Accuracy: 100.00 Testing Accuracy: 61.59 [iter: 770 ]: Training Loss: 0.00 Accuracy: 100.00 [iter: 780 ]: Training Loss: 0.00 Accuracy: 100.00 Testing Accuracy: 61.62 [iter: 790 ]: Training Loss: 0.00 Accuracy: 100.00 [iter: 800 ]: Training Loss: 0.00 Accuracy: 100.00 Testing Accuracy: 61.58 [iter: 810 ]: Training Loss: 0.00 Accuracy: 100.00 [iter: 820 ]: Training Loss: 0.00 Accuracy: 100.00 Testing Accuracy: 61.69 [iter: 830 ]: Training Loss: 0.00 Accuracy: 100.00 [iter: 840 ]: Training Loss: 0.00 Accuracy: 100.00 Testing Accuracy: 61.73 [iter: 850 ]: Training Loss: 0.00 Accuracy: 100.00 [iter: 860 ]: Training Loss: 0.00 Accuracy: 100.00 Testing Accuracy: 61.80 [iter: 870 ]: Training Loss: 0.00 Accuracy: 100.00 [iter: 880 ]: Training Loss: 0.00 Accuracy: 100.00 Testing Accuracy: 61.76 [iter: 890 ]: Training Loss: 0.00 Accuracy: 100.00 [iter: 900 ]: Training Loss: 0.00 Accuracy: 100.00 Testing Accuracy: 61.58 [iter: 910 ]: Training Loss: 0.00 Accuracy: 100.00 [iter: 920 ]: Training Loss: 0.00 Accuracy: 100.00 Testing Accuracy: 61.54 [iter: 930 ]: Training Loss: 0.00 Accuracy: 100.00 [iter: 940 ]: Training Loss: 0.00 Accuracy: 100.00 Testing Accuracy: 61.69 [iter: 950 ]: Training Loss: 0.00 Accuracy: 100.00 [iter: 960 ]: Training Loss: 0.00 Accuracy: 100.00 Testing Accuracy: 61.66 [iter: 970 ]: Training Loss: 0.00 Accuracy: 100.00 [iter: 980 ]: Training Loss: 0.00 Accuracy: 100.00 Testing Accuracy: 61.50 [iter: 990 ]: Training Loss: 0.00 Accuracy: 100.00 Testing Accuracy: 61.65
Questions:
Copyright 2021, University of Birmingham
Tutorial for Neural Computation
For issues e-mail: k.kamnitsas@bham.ac.uk